diff options
author | Michael Giannakopoulos <miccagiann@gmail.com> | 2014-07-20 20:48:44 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-07-20 20:48:44 -0700 |
commit | db56f2df1b8027171da1b8d2571d1f2ef1e103b6 (patch) | |
tree | c386e760532b3754d28f14999288fb051824a5b9 /docs/mllib-dimensionality-reduction.md | |
parent | f6e7302cb49ee227aed537026d19f68528051dfd (diff) | |
download | spark-db56f2df1b8027171da1b8d2571d1f2ef1e103b6.tar.gz spark-db56f2df1b8027171da1b8d2571d1f2ef1e103b6.tar.bz2 spark-db56f2df1b8027171da1b8d2571d1f2ef1e103b6.zip |
[SPARK-1945][MLLIB] Documentation Improvements for Spark 1.0
Standalone application examples are added to 'mllib-linear-methods.md' file written in Java.
This commit is related to the issue [Add full Java Examples in MLlib docs](https://issues.apache.org/jira/browse/SPARK-1945).
Also I changed the name of the sigmoid function from 'logit' to 'f'. This is because the logit function
is the inverse of sigmoid.
Thanks,
Michael
Author: Michael Giannakopoulos <miccagiann@gmail.com>
Closes #1311 from miccagiann/master and squashes the following commits:
8ffe5ab [Michael Giannakopoulos] Update code so as to comply with code standards.
f7ad5cc [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master'
38d92c7 [Michael Giannakopoulos] Adding PCA, SVD and LBFGS examples in Java. Performing minor updates in the already committed examples so as to eradicate the call of 'productElement' function whenever is possible.
cc0a089 [Michael Giannakopoulos] Modyfied Java examples so as to comply with coding standards.
b1141b2 [Michael Giannakopoulos] Added Java examples for Clustering and Collaborative Filtering [mllib-clustering.md & mllib-collaborative-filtering.md].
837f7a8 [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master'
15f0eb4 [Michael Giannakopoulos] Java examples included in 'mllib-linear-methods.md' file.
Diffstat (limited to 'docs/mllib-dimensionality-reduction.md')
-rw-r--r-- | docs/mllib-dimensionality-reduction.md | 94 |
1 files changed, 94 insertions, 0 deletions
diff --git a/docs/mllib-dimensionality-reduction.md b/docs/mllib-dimensionality-reduction.md index e3608075fb..8e434998c1 100644 --- a/docs/mllib-dimensionality-reduction.md +++ b/docs/mllib-dimensionality-reduction.md @@ -57,10 +57,57 @@ val U: RowMatrix = svd.U // The U factor is a RowMatrix. val s: Vector = svd.s // The singular values are stored in a local dense vector. val V: Matrix = svd.V // The V factor is a local dense matrix. {% endhighlight %} + +Same code applies to `IndexedRowMatrix`. +The only difference that the `U` matrix becomes an `IndexedRowMatrix`. </div> +<div data-lang="java" markdown="1"> +In order to run the following standalone application using Spark framework make +sure that you follow the instructions provided at section [Standalone +Applications](quick-start.html) of the quick-start guide. What is more, you +should include to your build file *spark-mllib* as a dependency. + +{% highlight java %} +import java.util.LinkedList; + +import org.apache.spark.api.java.*; +import org.apache.spark.mllib.linalg.distributed.RowMatrix; +import org.apache.spark.mllib.linalg.Matrix; +import org.apache.spark.mllib.linalg.SingularValueDecomposition; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.rdd.RDD; +import org.apache.spark.SparkConf; +import org.apache.spark.SparkContext; + +public class SVD { + public static void main(String[] args) { + SparkConf conf = new SparkConf().setAppName("SVD Example"); + SparkContext sc = new SparkContext(conf); + + double[][] array = ... + LinkedList<Vector> rowsList = new LinkedList<Vector>(); + for (int i = 0; i < array.length; i++) { + Vector currentRow = Vectors.dense(array[i]); + rowsList.add(currentRow); + } + JavaRDD<Vector> rows = JavaSparkContext.fromSparkContext(sc).parallelize(rowsList); + + // Create a RowMatrix from JavaRDD<Vector>. + RowMatrix mat = new RowMatrix(rows.rdd()); + + // Compute the top 4 singular values and corresponding singular vectors. + SingularValueDecomposition<RowMatrix, Matrix> svd = mat.computeSVD(4, true, 1.0E-9d); + RowMatrix U = svd.U(); + Vector s = svd.s(); + Matrix V = svd.V(); + } +} +{% endhighlight %} Same code applies to `IndexedRowMatrix`. The only difference that the `U` matrix becomes an `IndexedRowMatrix`. </div> +</div> ## Principal component analysis (PCA) @@ -91,4 +138,51 @@ val pc: Matrix = mat.computePrincipalComponents(10) // Principal components are val projected: RowMatrix = mat.multiply(pc) {% endhighlight %} </div> + +<div data-lang="java" markdown="1"> + +The following code demonstrates how to compute principal components on a tall-and-skinny `RowMatrix` +and use them to project the vectors into a low-dimensional space. +The number of columns should be small, e.g, less than 1000. + +{% highlight java %} +import java.util.LinkedList; + +import org.apache.spark.api.java.*; +import org.apache.spark.mllib.linalg.distributed.RowMatrix; +import org.apache.spark.mllib.linalg.Matrix; +import org.apache.spark.mllib.linalg.Vector; +import org.apache.spark.mllib.linalg.Vectors; +import org.apache.spark.rdd.RDD; +import org.apache.spark.SparkConf; +import org.apache.spark.SparkContext; + +public class PCA { + public static void main(String[] args) { + SparkConf conf = new SparkConf().setAppName("PCA Example"); + SparkContext sc = new SparkContext(conf); + + double[][] array = ... + LinkedList<Vector> rowsList = new LinkedList<Vector>(); + for (int i = 0; i < array.length; i++) { + Vector currentRow = Vectors.dense(array[i]); + rowsList.add(currentRow); + } + JavaRDD<Vector> rows = JavaSparkContext.fromSparkContext(sc).parallelize(rowsList); + + // Create a RowMatrix from JavaRDD<Vector>. + RowMatrix mat = new RowMatrix(rows.rdd()); + + // Compute the top 3 principal components. + Matrix pc = mat.computePrincipalComponents(3); + RowMatrix projected = mat.multiply(pc); + } +} +{% endhighlight %} + +In order to run the above standalone application using Spark framework make +sure that you follow the instructions provided at section [Standalone +Applications](quick-start.html) of the quick-start guide. What is more, you +should include to your build file *spark-mllib* as a dependency. +</div> </div> |