aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/mllib-data-types.md (renamed from docs/mllib-basics.md)4
-rw-r--r--docs/mllib-dimensionality-reduction.md4
-rw-r--r--docs/mllib-guide.md9
-rw-r--r--docs/mllib-statistics.md (renamed from docs/mllib-stats.md)156
4 files changed, 87 insertions, 86 deletions
diff --git a/docs/mllib-basics.md b/docs/mllib-data-types.md
index 8752df4129..101dc2f869 100644
--- a/docs/mllib-basics.md
+++ b/docs/mllib-data-types.md
@@ -1,7 +1,7 @@
---
layout: global
-title: Basics - MLlib
-displayTitle: <a href="mllib-guide.html">MLlib</a> - Basics
+title: Data Types - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Data Types
---
* Table of contents
diff --git a/docs/mllib-dimensionality-reduction.md b/docs/mllib-dimensionality-reduction.md
index 9f2cf6d48e..21cb35b427 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -11,7 +11,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Dimensionality Reduction
of reducing the number of variables under consideration.
It can be used to extract latent features from raw and noisy features
or compress data while maintaining the structure.
-MLlib provides support for dimensionality reduction on the <a href="mllib-basics.html#rowmatrix">RowMatrix</a> class.
+MLlib provides support for dimensionality reduction on the <a href="mllib-data-types.html#rowmatrix">RowMatrix</a> class.
## Singular value decomposition (SVD)
@@ -58,7 +58,7 @@ passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.
### SVD Example
MLlib provides SVD functionality to row-oriented matrices, provided in the
-<a href="mllib-basics.html#rowmatrix">RowMatrix</a> class.
+<a href="mllib-data-types.html#rowmatrix">RowMatrix</a> class.
<div class="codetabs">
<div data-lang="scala" markdown="1">
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 4d4198b9e0..d3a510b3c1 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -7,12 +7,13 @@ MLlib is Spark's scalable machine learning library consisting of common learning
including classification, regression, clustering, collaborative
filtering, dimensionality reduction, as well as underlying optimization primitives, as outlined below:
-* [Data types](mllib-basics.html)
-* [Basic statistics](mllib-stats.html)
- * random data generation
- * stratified sampling
+* [Data types](mllib-data-types.html)
+* [Basic statistics](mllib-statistics.html)
* summary statistics
+ * correlations
+ * stratified sampling
* hypothesis testing
+ * random data generation
* [Classification and regression](mllib-classification-regression.html)
* [linear models (SVMs, logistic regression, linear regression)](mllib-linear-methods.html)
* [decision trees](mllib-decision-tree.html)
diff --git a/docs/mllib-stats.md b/docs/mllib-statistics.md
index 511a9fbf71..c463241399 100644
--- a/docs/mllib-stats.md
+++ b/docs/mllib-statistics.md
@@ -1,7 +1,7 @@
---
layout: global
-title: Statistics Functionality - MLlib
-displayTitle: <a href="mllib-guide.html">MLlib</a> - Statistics Functionality
+title: Basic Statistics - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Basic Statistics
---
* Table of contents
@@ -25,7 +25,7 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Statistics Functionality
\newcommand{\zero}{\mathbf{0}}
\]`
-## Summary Statistics
+## Summary statistics
We provide column summary statistics for `RDD[Vector]` through the function `colStats`
available in `Statistics`.
@@ -104,81 +104,7 @@ print summary.numNonzeros()
</div>
-## Random data generation
-
-Random data generation is useful for randomized algorithms, prototyping, and performance testing.
-MLlib supports generating random RDDs with i.i.d. values drawn from a given distribution:
-uniform, standard normal, or Poisson.
-
-<div class="codetabs">
-<div data-lang="scala" markdown="1">
-[`RandomRDDs`](api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
-methods to generate random double RDDs or vector RDDs.
-The following example generates a random double RDD, whose values follows the standard normal
-distribution `N(0, 1)`, and then map it to `N(1, 4)`.
-
-{% highlight scala %}
-import org.apache.spark.SparkContext
-import org.apache.spark.mllib.random.RandomRDDs._
-
-val sc: SparkContext = ...
-
-// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
-// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
-val u = normalRDD(sc, 1000000L, 10)
-// Apply a transform to get a random double RDD following `N(1, 4)`.
-val v = u.map(x => 1.0 + 2.0 * x)
-{% endhighlight %}
-</div>
-
-<div data-lang="java" markdown="1">
-[`RandomRDDs`](api/java/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
-methods to generate random double RDDs or vector RDDs.
-The following example generates a random double RDD, whose values follows the standard normal
-distribution `N(0, 1)`, and then map it to `N(1, 4)`.
-
-{% highlight java %}
-import org.apache.spark.SparkContext;
-import org.apache.spark.api.JavaDoubleRDD;
-import static org.apache.spark.mllib.random.RandomRDDs.*;
-
-JavaSparkContext jsc = ...
-
-// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
-// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
-JavaDoubleRDD u = normalJavaRDD(jsc, 1000000L, 10);
-// Apply a transform to get a random double RDD following `N(1, 4)`.
-JavaDoubleRDD v = u.map(
- new Function<Double, Double>() {
- public Double call(Double x) {
- return 1.0 + 2.0 * x;
- }
- });
-{% endhighlight %}
-</div>
-
-<div data-lang="python" markdown="1">
-[`RandomRDDs`](api/python/pyspark.mllib.random.RandomRDDs-class.html) provides factory
-methods to generate random double RDDs or vector RDDs.
-The following example generates a random double RDD, whose values follows the standard normal
-distribution `N(0, 1)`, and then map it to `N(1, 4)`.
-
-{% highlight python %}
-from pyspark.mllib.random import RandomRDDs
-
-sc = ... # SparkContext
-
-# Generate a random double RDD that contains 1 million i.i.d. values drawn from the
-# standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
-u = RandomRDDs.uniformRDD(sc, 1000000L, 10)
-# Apply a transform to get a random double RDD following `N(1, 4)`.
-v = u.map(lambda x: 1.0 + 2.0 * x)
-{% endhighlight %}
-</div>
-
-</div>
-
-## Correlations calculation
+## Correlations
Calculating the correlation between two series of data is a common operation in Statistics. In MLlib
we provide the flexibility to calculate pairwise correlations among many series. The supported
@@ -455,3 +381,77 @@ for (ChiSqTestResult result : featureTestResults) {
</div>
</div>
+
+## Random data generation
+
+Random data generation is useful for randomized algorithms, prototyping, and performance testing.
+MLlib supports generating random RDDs with i.i.d. values drawn from a given distribution:
+uniform, standard normal, or Poisson.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+[`RandomRDDs`](api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
+methods to generate random double RDDs or vector RDDs.
+The following example generates a random double RDD, whose values follows the standard normal
+distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+
+{% highlight scala %}
+import org.apache.spark.SparkContext
+import org.apache.spark.mllib.random.RandomRDDs._
+
+val sc: SparkContext = ...
+
+// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
+// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
+val u = normalRDD(sc, 1000000L, 10)
+// Apply a transform to get a random double RDD following `N(1, 4)`.
+val v = u.map(x => 1.0 + 2.0 * x)
+{% endhighlight %}
+</div>
+
+<div data-lang="java" markdown="1">
+[`RandomRDDs`](api/java/index.html#org.apache.spark.mllib.random.RandomRDDs) provides factory
+methods to generate random double RDDs or vector RDDs.
+The following example generates a random double RDD, whose values follows the standard normal
+distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+
+{% highlight java %}
+import org.apache.spark.SparkContext;
+import org.apache.spark.api.JavaDoubleRDD;
+import static org.apache.spark.mllib.random.RandomRDDs.*;
+
+JavaSparkContext jsc = ...
+
+// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
+// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
+JavaDoubleRDD u = normalJavaRDD(jsc, 1000000L, 10);
+// Apply a transform to get a random double RDD following `N(1, 4)`.
+JavaDoubleRDD v = u.map(
+ new Function<Double, Double>() {
+ public Double call(Double x) {
+ return 1.0 + 2.0 * x;
+ }
+ });
+{% endhighlight %}
+</div>
+
+<div data-lang="python" markdown="1">
+[`RandomRDDs`](api/python/pyspark.mllib.random.RandomRDDs-class.html) provides factory
+methods to generate random double RDDs or vector RDDs.
+The following example generates a random double RDD, whose values follows the standard normal
+distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+
+{% highlight python %}
+from pyspark.mllib.random import RandomRDDs
+
+sc = ... # SparkContext
+
+# Generate a random double RDD that contains 1 million i.i.d. values drawn from the
+# standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
+u = RandomRDDs.uniformRDD(sc, 1000000L, 10)
+# Apply a transform to get a random double RDD following `N(1, 4)`.
+v = u.map(lambda x: 1.0 + 2.0 * x)
+{% endhighlight %}
+</div>
+
+</div>