SPARK-2830 [MLlib]: re-organize mllib documentation

As per discussions with Xiangrui, I've reorganized and edited the mllib documentation. Author: Ameet Talwalkar <atalwalkar@gmail.com> Closes #1908 from atalwalkar/master and squashes the following commits: fe6938a [Ameet Talwalkar] made xiangruis suggested changes 840028b [Ameet Talwalkar] made xiangruis suggested changes 7ec366a [Ameet Talwalkar] reorganize and edit mllib documentation
author: Ameet Talwalkar <atalwalkar@gmail.com> 2014-08-12 17:15:21 -0700
committer: Xiangrui Meng <meng@databricks.com> 2014-08-12 17:15:21 -0700
commit: c235b83e2782cce0626ecc403c0a67e442be52c1 (patch)
tree: 30b4ada17cba016cc2a8a7f01f09b7bcb78fbace /docs/mllib-dimensionality-reduction.md
parent: 882da57a1c8c075a87909d516b169b624941a6ec (diff)
download: spark-c235b83e2782cce0626ecc403c0a67e442be52c1.tar.gz
spark-c235b83e2782cce0626ecc403c0a67e442be52c1.tar.bz2
spark-c235b83e2782cce0626ecc403c0a67e442be52c1.zip
1 files changed, 24 insertions, 20 deletions
diff --git a/docs/mllib-dimensionality-reduction.md b/docs/mllib-dimensionality-reduction.md
index 8e434998c1..065d646496 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -9,9 +9,9 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Dimensionality Reduction
 
 [Dimensionality reduction](http://en.wikipedia.org/wiki/Dimensionality_reduction) is the process 
 of reducing the number of variables under consideration.
-It is used to extract latent features from raw and noisy features,
+It can be used to extract latent features from raw and noisy features
 or compress data while maintaining the structure.
-In this release, we provide preliminary support for dimensionality reduction on tall-and-skinny matrices.
+MLlib provides support for dimensionality reduction on tall-and-skinny matrices.
 
 ## Singular value decomposition (SVD)
 
@@ -30,17 +30,17 @@ where
 * $V$ is an orthonormal matrix, whose columns are called right singular vectors.
  
 For large matrices, usually we don't need the complete factorization but only the top singular
-values and its associated singular vectors.  This can save storage, and more importantly, de-noise
+values and its associated singular vectors.  This can save storage, de-noise
 and recover the low-rank structure of the matrix.
 
-If we keep the top $k$ singular values, then the dimensions of the return will be:
+If we keep the top $k$ singular values, then the dimensions of the resulting low-rank matrix will be:
 
 * `$U$`: `$m \times k$`,
 * `$\Sigma$`: `$k \times k$`,
 * `$V$`: `$n \times k$`.
  
-In this release, we provide SVD computation to row-oriented matrices that have only a few columns,
-say, less than $1000$, but many rows, which we call *tall-and-skinny*.
+MLlib provides SVD functionality to row-oriented matrices that have only a few columns,
+say, less than $1000$, but many rows, i.e., *tall-and-skinny* matrices.
 
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
@@ -58,15 +58,10 @@ val s: Vector = svd.s // The singular values are stored in a local dense vector.
 val V: Matrix = svd.V // The V factor is a local dense matrix.
 {% endhighlight %}
 
-Same code applies to `IndexedRowMatrix`.
-The only difference that the `U` matrix becomes an `IndexedRowMatrix`.
+The same code applies to `IndexedRowMatrix` if `U` is defined as an
+`IndexedRowMatrix`.
 </div>
 <div data-lang="java" markdown="1">
-In order to run the following standalone application using Spark framework make
-sure that you follow the instructions provided at section [Standalone
-Applications](quick-start.html) of the quick-start guide. What is more, you
-should include to your build file *spark-mllib* as a dependency.
-
 {% highlight java %}
 import java.util.LinkedList;
 
@@ -104,8 +99,16 @@ public class SVD {
   }
 }
 {% endhighlight %}
-Same code applies to `IndexedRowMatrix`.
-The only difference that the `U` matrix becomes an `IndexedRowMatrix`.
+
+The same code applies to `IndexedRowMatrix` if `U` is defined as an
+`IndexedRowMatrix`.
+
+In order to run the above standalone application, follow the instructions
+provided in the [Standalone
+Applications](quick-start.html#standalone-applications) section of the Spark
+quick-start guide. Be sure to also include *spark-mllib* to your build file as
+a dependency.
+
 </div>
 </div>
 
@@ -116,7 +119,7 @@ statistical method to find a rotation such that the first coordinate has the lar
 possible, and each succeeding coordinate in turn has the largest variance possible. The columns of
 the rotation matrix are called principal components. PCA is used widely in dimensionality reduction.
 
-In this release, we implement PCA for tall-and-skinny matrices stored in row-oriented format.
+MLlib supports PCA for tall-and-skinny matrices stored in row-oriented format.
 
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
@@ -180,9 +183,10 @@ public class PCA {
 }
 {% endhighlight %}
 
-In order to run the above standalone application using Spark framework make
-sure that you follow the instructions provided at section [Standalone
-Applications](quick-start.html) of the quick-start guide. What is more, you
-should include to your build file *spark-mllib* as a dependency.
+In order to run the above standalone application, follow the instructions
+provided in the [Standalone
+Applications](quick-start.html#standalone-applications) section of the Spark
+quick-start guide. Be sure to also include *spark-mllib* to your build file as
+a dependency.
 </div>
 </div>
author	Ameet Talwalkar <atalwalkar@gmail.com>	2014-08-12 17:15:21 -0700
committer	Xiangrui Meng <meng@databricks.com>	2014-08-12 17:15:21 -0700
commit	c235b83e2782cce0626ecc403c0a67e442be52c1 (patch)
tree	30b4ada17cba016cc2a8a7f01f09b7bcb78fbace /docs/mllib-dimensionality-reduction.md
parent	882da57a1c8c075a87909d516b169b624941a6ec (diff)
download	spark-c235b83e2782cce0626ecc403c0a67e442be52c1.tar.gz spark-c235b83e2782cce0626ecc403c0a67e442be52c1.tar.bz2 spark-c235b83e2782cce0626ecc403c0a67e442be52c1.zip