[SPARK-9671] [MLLIB] re-org user guide and add migration guide

This PR updates the MLlib user guide and adds migration guide for 1.4->1.5. * merge migration guide for `spark.mllib` and `spark.ml` packages * remove dependency section from `spark.ml` guide * move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml` * move Sam's talk to footnote to make the section focus on dependencies Minor changes to code examples and other wording will be in a separate PR. jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8498 from mengxr/SPARK-9671.
author: Xiangrui Meng <meng@databricks.com> 2015-08-28 13:53:31 -0700
committer: Xiangrui Meng <meng@databricks.com> 2015-08-28 13:53:31 -0700
commit: 88032ecaf0455886aed7a66b30af80dae7f6cff7 (patch)
tree: c257040c963a892f354b15b83d44c3660a9d72be /docs/ml-guide.md
parent: 45723214e694b9a440723e9504c562e6393709f3 (diff)
download: spark-88032ecaf0455886aed7a66b30af80dae7f6cff7.tar.gz
spark-88032ecaf0455886aed7a66b30af80dae7f6cff7.tar.bz2
spark-88032ecaf0455886aed7a66b30af80dae7f6cff7.zip
1 files changed, 6 insertions, 46 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index 01bf5ee18e..ce53400b6e 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -21,19 +21,11 @@ title: Spark ML Programming Guide
 \]`
 
 
-Spark 1.2 introduced a new package called `spark.ml`, which aims to provide a uniform set of
-high-level APIs that help users create and tune practical machine learning pipelines.
-
-*Graduated from Alpha!*  The Pipelines API is no longer an alpha component, although many elements of it are still `Experimental` or `DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib` along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more features coming.
-Developers should contribute new algorithms to `spark.mllib` and can optionally contribute
-to `spark.ml`.
-
-See the [Algorithm Guides section](#algorithm-guides) below for guides on sub-packages of `spark.ml`, including feature transformers unique to the Pipelines API, ensembles, and more.
-
+The `spark.ml` package aims to provide a uniform set of high-level APIs built on top of
+[DataFrames](sql-programming-guide.html#dataframes) that help users create and tune practical
+machine learning pipelines.
+See the [Algorithm Guides section](#algorithm-guides) below for guides on sub-packages of
+`spark.ml`, including feature transformers unique to the Pipelines API, ensembles, and more.
 
 **Table of Contents**
 
@@ -171,7 +163,7 @@ This is useful if there are two algorithms with the `maxIter` parameter in a `Pi
 
 # Algorithm Guides
 
-There are now several algorithms in the Pipelines API which are not in the lower-level MLlib API, so we link to documentation for them here.  These algorithms are mostly feature transformers, which fit naturally into the `Transformer` abstraction in Pipelines, and ensembles, which fit naturally into the `Estimator` abstraction in the Pipelines.
+There are now several algorithms in the Pipelines API which are not in the `spark.mllib` API, so we link to documentation for them here.  These algorithms are mostly feature transformers, which fit naturally into the `Transformer` abstraction in Pipelines, and ensembles, which fit naturally into the `Estimator` abstraction in the Pipelines.
 
 **Pipelines API Algorithm Guides**
 
@@ -880,35 +872,3 @@ jsc.stop();
 </div>
 
 </div>
-
-# Dependencies
-
-Spark ML currently depends on MLlib and has the same dependencies.
-Please see the [MLlib Dependencies guide](mllib-guide.html#dependencies) for more info.
-
-Spark ML also depends upon Spark SQL, but the relevant parts of Spark SQL do not bring additional dependencies.
-
-# Migration Guide
-
-## From 1.3 to 1.4
-
-Several major API changes occurred, including:
-* `Param` and other APIs for specifying parameters
-* `uid` unique IDs for Pipeline components
-* Reorganization of certain classes
-Since the `spark.ml` API was an Alpha Component in Spark 1.3, we do not list all changes here.
-
-However, now that `spark.ml` is no longer an Alpha Component, we will provide details on any API changes for future releases.
-
-## From 1.2 to 1.3
-
-The main API changes are from Spark SQL.  We list the most important changes here:
-
-* The old [SchemaRDD](http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.sql.SchemaRDD) has been replaced with [DataFrame](api/scala/index.html#org.apache.spark.sql.DataFrame) with a somewhat modified API.  All algorithms in Spark ML which used to use SchemaRDD now use DataFrame.
-* In Spark 1.2, we used implicit conversions from `RDD`s of `LabeledPoint` into `SchemaRDD`s by calling `import sqlContext._` where `sqlContext` was an instance of `SQLContext`.  These implicits have been moved, so we now call `import sqlContext.implicits._`.
-* Java APIs for SQL have also changed accordingly.  Please see the examples above and the [Spark SQL Programming Guide](sql-programming-guide.html) for details.
-
-Other changes were in `LogisticRegression`:
-
-* The `scoreCol` output column (with default value "score") was renamed to be `probabilityCol` (with default value "probability").  The type was originally `Double` (for the probability of class 1.0), but it is now `Vector` (for the probability of each class, to support multiclass classification in the future).
-* In Spark 1.2, `LogisticRegressionModel` did not include an intercept.  In Spark 1.3, it includes an intercept; however, it will always be 0.0 since it uses the default settings for [spark.mllib.LogisticRegressionWithLBFGS](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS).  The option to use an intercept will be added in the future.
author	Xiangrui Meng <meng@databricks.com>	2015-08-28 13:53:31 -0700
committer	Xiangrui Meng <meng@databricks.com>	2015-08-28 13:53:31 -0700
commit	88032ecaf0455886aed7a66b30af80dae7f6cff7 (patch)
tree	c257040c963a892f354b15b83d44c3660a9d72be /docs/ml-guide.md
parent	45723214e694b9a440723e9504c562e6393709f3 (diff)
download	spark-88032ecaf0455886aed7a66b30af80dae7f6cff7.tar.gz spark-88032ecaf0455886aed7a66b30af80dae7f6cff7.tar.bz2 spark-88032ecaf0455886aed7a66b30af80dae7f6cff7.zip