diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2015-06-21 16:25:25 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-06-21 16:25:25 -0700 |
commit | a1894422ad6b3335c84c73ba9466da6677d893cb (patch) | |
tree | 8bba7cc2493b57e8e24f8f28003836c2b72cbec7 /docs/mllib-migration-guides.md | |
parent | 83cdfd84f8ca679e1ec451ed88b946e8e7f13a94 (diff) | |
download | spark-a1894422ad6b3335c84c73ba9466da6677d893cb.tar.gz spark-a1894422ad6b3335c84c73ba9466da6677d893cb.tar.bz2 spark-a1894422ad6b3335c84c73ba9466da6677d893cb.zip |
[SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4
Reorganized docs a bit. Added migration guides.
**Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```? It would be a lot.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:
4bf26d6 [Joseph K. Bradley] tiny fix
8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4
Diffstat (limited to 'docs/mllib-migration-guides.md')
-rw-r--r-- | docs/mllib-migration-guides.md | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/docs/mllib-migration-guides.md b/docs/mllib-migration-guides.md index 4de2d9491a..8df68d81f3 100644 --- a/docs/mllib-migration-guides.md +++ b/docs/mllib-migration-guides.md @@ -7,6 +7,22 @@ description: MLlib migration guides from before Spark SPARK_VERSION_SHORT The migration guide for the current Spark version is kept on the [MLlib Programming Guide main page](mllib-guide.html#migration-guide). +## From 1.2 to 1.3 + +In the `spark.mllib` package, there were several breaking changes. The first change (in `ALS`) is the only one in a component not marked as Alpha or Experimental. + +* *(Breaking change)* In [`ALS`](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS), the extraneous method `solveLeastSquares` has been removed. The `DeveloperApi` method `analyzeBlocks` was also removed. +* *(Breaking change)* [`StandardScalerModel`](api/scala/index.html#org.apache.spark.mllib.feature.StandardScalerModel) remains an Alpha component. In it, the `variance` method has been replaced with the `std` method. To compute the column variance values returned by the original `variance` method, simply square the standard deviation values returned by `std`. +* *(Breaking change)* [`StreamingLinearRegressionWithSGD`](api/scala/index.html#org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD) remains an Experimental component. In it, there were two changes: + * The constructor taking arguments was removed in favor of a builder pattern using the default constructor plus parameter setter methods. + * Variable `model` is no longer public. +* *(Breaking change)* [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree) remains an Experimental component. In it and its associated classes, there were several changes: + * In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.) + * In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training. +* `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use. +* In linear regression (including Lasso and ridge regression), the squared loss is now divided by 2. + So in order to produce the same result as in 1.2, the regularization parameter needs to be divided by 2 and the step size needs to be multiplied by 2. + ## From 1.1 to 1.2 The only API changes in MLlib v1.2 are in |