aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-dimensionality-reduction.md
diff options
context:
space:
mode:
authorMichael Giannakopoulos <miccagiann@gmail.com>2014-07-20 20:48:44 -0700
committerXiangrui Meng <meng@databricks.com>2014-07-20 20:48:44 -0700
commitdb56f2df1b8027171da1b8d2571d1f2ef1e103b6 (patch)
treec386e760532b3754d28f14999288fb051824a5b9 /docs/mllib-dimensionality-reduction.md
parentf6e7302cb49ee227aed537026d19f68528051dfd (diff)
downloadspark-db56f2df1b8027171da1b8d2571d1f2ef1e103b6.tar.gz
spark-db56f2df1b8027171da1b8d2571d1f2ef1e103b6.tar.bz2
spark-db56f2df1b8027171da1b8d2571d1f2ef1e103b6.zip
[SPARK-1945][MLLIB] Documentation Improvements for Spark 1.0
Standalone application examples are added to 'mllib-linear-methods.md' file written in Java. This commit is related to the issue [Add full Java Examples in MLlib docs](https://issues.apache.org/jira/browse/SPARK-1945). Also I changed the name of the sigmoid function from 'logit' to 'f'. This is because the logit function is the inverse of sigmoid. Thanks, Michael Author: Michael Giannakopoulos <miccagiann@gmail.com> Closes #1311 from miccagiann/master and squashes the following commits: 8ffe5ab [Michael Giannakopoulos] Update code so as to comply with code standards. f7ad5cc [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master' 38d92c7 [Michael Giannakopoulos] Adding PCA, SVD and LBFGS examples in Java. Performing minor updates in the already committed examples so as to eradicate the call of 'productElement' function whenever is possible. cc0a089 [Michael Giannakopoulos] Modyfied Java examples so as to comply with coding standards. b1141b2 [Michael Giannakopoulos] Added Java examples for Clustering and Collaborative Filtering [mllib-clustering.md & mllib-collaborative-filtering.md]. 837f7a8 [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master' 15f0eb4 [Michael Giannakopoulos] Java examples included in 'mllib-linear-methods.md' file.
Diffstat (limited to 'docs/mllib-dimensionality-reduction.md')
-rw-r--r--docs/mllib-dimensionality-reduction.md94
1 files changed, 94 insertions, 0 deletions
diff --git a/docs/mllib-dimensionality-reduction.md b/docs/mllib-dimensionality-reduction.md
index e3608075fb..8e434998c1 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -57,10 +57,57 @@ val U: RowMatrix = svd.U // The U factor is a RowMatrix.
val s: Vector = svd.s // The singular values are stored in a local dense vector.
val V: Matrix = svd.V // The V factor is a local dense matrix.
{% endhighlight %}
+
+Same code applies to `IndexedRowMatrix`.
+The only difference that the `U` matrix becomes an `IndexedRowMatrix`.
</div>
+<div data-lang="java" markdown="1">
+In order to run the following standalone application using Spark framework make
+sure that you follow the instructions provided at section [Standalone
+Applications](quick-start.html) of the quick-start guide. What is more, you
+should include to your build file *spark-mllib* as a dependency.
+
+{% highlight java %}
+import java.util.LinkedList;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.apache.spark.mllib.linalg.Matrix;
+import org.apache.spark.mllib.linalg.SingularValueDecomposition;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class SVD {
+ public static void main(String[] args) {
+ SparkConf conf = new SparkConf().setAppName("SVD Example");
+ SparkContext sc = new SparkContext(conf);
+
+ double[][] array = ...
+ LinkedList<Vector> rowsList = new LinkedList<Vector>();
+ for (int i = 0; i < array.length; i++) {
+ Vector currentRow = Vectors.dense(array[i]);
+ rowsList.add(currentRow);
+ }
+ JavaRDD<Vector> rows = JavaSparkContext.fromSparkContext(sc).parallelize(rowsList);
+
+ // Create a RowMatrix from JavaRDD<Vector>.
+ RowMatrix mat = new RowMatrix(rows.rdd());
+
+ // Compute the top 4 singular values and corresponding singular vectors.
+ SingularValueDecomposition<RowMatrix, Matrix> svd = mat.computeSVD(4, true, 1.0E-9d);
+ RowMatrix U = svd.U();
+ Vector s = svd.s();
+ Matrix V = svd.V();
+ }
+}
+{% endhighlight %}
Same code applies to `IndexedRowMatrix`.
The only difference that the `U` matrix becomes an `IndexedRowMatrix`.
</div>
+</div>
## Principal component analysis (PCA)
@@ -91,4 +138,51 @@ val pc: Matrix = mat.computePrincipalComponents(10) // Principal components are
val projected: RowMatrix = mat.multiply(pc)
{% endhighlight %}
</div>
+
+<div data-lang="java" markdown="1">
+
+The following code demonstrates how to compute principal components on a tall-and-skinny `RowMatrix`
+and use them to project the vectors into a low-dimensional space.
+The number of columns should be small, e.g, less than 1000.
+
+{% highlight java %}
+import java.util.LinkedList;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.mllib.linalg.distributed.RowMatrix;
+import org.apache.spark.mllib.linalg.Matrix;
+import org.apache.spark.mllib.linalg.Vector;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class PCA {
+ public static void main(String[] args) {
+ SparkConf conf = new SparkConf().setAppName("PCA Example");
+ SparkContext sc = new SparkContext(conf);
+
+ double[][] array = ...
+ LinkedList<Vector> rowsList = new LinkedList<Vector>();
+ for (int i = 0; i < array.length; i++) {
+ Vector currentRow = Vectors.dense(array[i]);
+ rowsList.add(currentRow);
+ }
+ JavaRDD<Vector> rows = JavaSparkContext.fromSparkContext(sc).parallelize(rowsList);
+
+ // Create a RowMatrix from JavaRDD<Vector>.
+ RowMatrix mat = new RowMatrix(rows.rdd());
+
+ // Compute the top 3 principal components.
+ Matrix pc = mat.computePrincipalComponents(3);
+ RowMatrix projected = mat.multiply(pc);
+ }
+}
+{% endhighlight %}
+
+In order to run the above standalone application using Spark framework make
+sure that you follow the instructions provided at section [Standalone
+Applications](quick-start.html) of the quick-start guide. What is more, you
+should include to your build file *spark-mllib* as a dependency.
+</div>
</div>