From df0aa8353ab6d3b19d838c6fa95a93a64948309f Mon Sep 17 00:00:00 2001 From: Xiangrui Meng Date: Sun, 18 May 2014 17:00:57 -0700 Subject: [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0 Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change --- docs/mllib-guide.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) (limited to 'docs/mllib-guide.md') diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 842ca5c8c6..640ca83085 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -27,8 +27,9 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv * stochastic gradient descent * limited-memory BFGS (L-BFGS) -MLlib is currently a *beta* component under active development. -The APIs may change in the future releases, and we will provide migration guide between releases. +MLlib is a new component under active development. +The APIs marked `Experimental`/`DeveloperApi` may change in future releases, +and we will provide migration guide between releases. ## Dependencies @@ -61,9 +62,9 @@ take advantage of sparsity in both storage and computation.
We used to represent a feature vector by `Array[Double]`, which is replaced by -[`Vector`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) in v1.0. Algorithms that used +[`Vector`](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) in v1.0. Algorithms that used to accept `RDD[Array[Double]]` now take -`RDD[Vector]`. [`LabeledPoint`](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint) +`RDD[Vector]`. [`LabeledPoint`](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint) is now a wrapper of `(Double, Vector)` instead of `(Double, Array[Double])`. Converting `Array[Double]` to `Vector` is straightforward: @@ -74,7 +75,7 @@ val array: Array[Double] = ... // a double array val vector: Vector = Vectors.dense(array) // a dense vector {% endhighlight %} -[`Vectors`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vectors$) provides factory methods to create sparse vectors. +[`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) provides factory methods to create sparse vectors. *Note*. Scala imports `scala.collection.immutable.Vector` by default, so you have to import `org.apache.spark.mllib.linalg.Vector` explicitly to use MLlib's `Vector`. @@ -83,9 +84,9 @@ val vector: Vector = Vectors.dense(array) // a dense vector
We used to represent a feature vector by `double[]`, which is replaced by -[`Vector`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) in v1.0. Algorithms that used +[`Vector`](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) in v1.0. Algorithms that used to accept `RDD` now take -`RDD`. [`LabeledPoint`](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint) +`RDD`. [`LabeledPoint`](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint) is now a wrapper of `(double, Vector)` instead of `(double, double[])`. Converting `double[]` to `Vector` is straightforward: @@ -97,7 +98,7 @@ double[] array = ... // a double array Vector vector = Vectors.dense(array); // a dense vector {% endhighlight %} -[`Vectors`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vectors$) provides factory methods to +[`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) provides factory methods to create sparse vectors.
@@ -106,7 +107,7 @@ create sparse vectors. We used to represent a labeled feature vector in a NumPy array, where the first entry corresponds to the label and the rest are features. This representation is replaced by class -[`LabeledPoint`](api/pyspark/pyspark.mllib.regression.LabeledPoint-class.html), which takes both +[`LabeledPoint`](api/python/pyspark.mllib.regression.LabeledPoint-class.html), which takes both dense and sparse feature vectors. {% highlight python %} -- cgit v1.2.3