diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2015-06-21 16:25:25 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-06-21 16:25:25 -0700 |
commit | a1894422ad6b3335c84c73ba9466da6677d893cb (patch) | |
tree | 8bba7cc2493b57e8e24f8f28003836c2b72cbec7 /docs/ml-guide.md | |
parent | 83cdfd84f8ca679e1ec451ed88b946e8e7f13a94 (diff) | |
download | spark-a1894422ad6b3335c84c73ba9466da6677d893cb.tar.gz spark-a1894422ad6b3335c84c73ba9466da6677d893cb.tar.bz2 spark-a1894422ad6b3335c84c73ba9466da6677d893cb.zip |
[SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4
Reorganized docs a bit. Added migration guides.
**Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```? It would be a lot.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:
4bf26d6 [Joseph K. Bradley] tiny fix
8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4
Diffstat (limited to 'docs/ml-guide.md')
-rw-r--r-- | docs/ml-guide.md | 32 |
1 files changed, 19 insertions, 13 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md index 4eb622d4b9..c74cb1f1ef 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -3,10 +3,10 @@ layout: global title: Spark ML Programming Guide --- -`spark.ml` is a new package introduced in Spark 1.2, which aims to provide a uniform set of +Spark 1.2 introduced a new package called `spark.ml`, which aims to provide a uniform set of high-level APIs that help users create and tune practical machine learning pipelines. -It is currently an alpha component, and we would like to hear back from the community about -how it fits real-world use cases and how it could be improved. + +*Graduated from Alpha!* The Pipelines API is no longer an alpha component, although many elements of it are still `Experimental` or `DeveloperApi`. Note that we will keep supporting and adding features to `spark.mllib` along with the development of `spark.ml`. @@ -14,6 +14,12 @@ Users should be comfortable using `spark.mllib` features and expect more feature Developers should contribute new algorithms to `spark.mllib` and can optionally contribute to `spark.ml`. +Guides for sub-packages of `spark.ml` include: + +* [Feature Extraction, Transformation, and Selection](ml-features.html): Details on transformers supported in the Pipelines API, including a few not in the lower-level `spark.mllib` API +* [Ensembles](ml-ensembles.html): Details on ensemble learning methods in the Pipelines API + + **Table of Contents** * This will become a table of contents (this text will be scraped). @@ -148,16 +154,6 @@ Parameters belong to specific instances of `Estimator`s and `Transformer`s. For example, if we have two `LogisticRegression` instances `lr1` and `lr2`, then we can build a `ParamMap` with both `maxIter` parameters specified: `ParamMap(lr1.maxIter -> 10, lr2.maxIter -> 20)`. This is useful if there are two algorithms with the `maxIter` parameter in a `Pipeline`. -# Algorithm Guides - -There are now several algorithms in the Pipelines API which are not in the lower-level MLlib API, so we link to documentation for them here. These algorithms are mostly feature transformers, which fit naturally into the `Transformer` abstraction in Pipelines, and ensembles, which fit naturally into the `Estimator` abstraction in the Pipelines. - -**Pipelines API Algorithm Guides** - -* [Feature Extraction, Transformation, and Selection](ml-features.html) -* [Ensembles](ml-ensembles.html) - - # Code Examples This section gives code examples illustrating the functionality discussed above. @@ -783,6 +779,16 @@ Spark ML also depends upon Spark SQL, but the relevant parts of Spark SQL do not # Migration Guide +## From 1.3 to 1.4 + +Several major API changes occurred, including: +* `Param` and other APIs for specifying parameters +* `uid` unique IDs for Pipeline components +* Reorganization of certain classes +Since the `spark.ml` API was an Alpha Component in Spark 1.3, we do not list all changes here. + +However, now that `spark.ml` is no longer an Alpha Component, we will provide details on any API changes for future releases. + ## From 1.2 to 1.3 The main API changes are from Spark SQL. We list the most important changes here: |