[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes

Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
author: Joseph K. Bradley <joseph@databricks.com> 2014-12-04 17:00:06 +0800
committer: Xiangrui Meng <meng@databricks.com> 2014-12-04 17:00:06 +0800
commit: 469a6e5f3bdd5593b3254bc916be8236e7c6cb74 (patch)
tree: fd9756fcaf83aca60724616dd9abaa55b7e5c6dd /docs/mllib-guide.md
parent: 529439bd506949f272a2b6f099ea549b097428f3 (diff)
download: spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.tar.gz
spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.tar.bz2
spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.zip
1 files changed, 12 insertions, 1 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index dcb6819f46..efd7dda310 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -1,6 +1,6 @@
 ---
 layout: global
-title: Machine Learning Library (MLlib)
+title: Machine Learning Library (MLlib) Programming Guide
 ---
 
 MLlib is Spark's scalable machine learning library consisting of common learning algorithms and utilities,
@@ -35,6 +35,17 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future releases, 
 and the migration guide below will explain all changes between releases.
 
+# spark.ml: The New ML Package
+
+Spark 1.2 includes a new machine learning package called `spark.ml`, currently an alpha component but potentially a successor to `spark.mllib`.  The `spark.ml` package aims to replace the old APIs with a cleaner, more uniform set of APIs which will help users create full machine learning pipelines.
+
+See the **[spark.ml programming guide](ml-guide.html)** for more information on this package.
+
+Users can use algorithms from either of the two packages, but APIs may differ.  Currently, `spark.ml` offers a subset of the algorithms from `spark.mllib`.
+
+Developers should contribute new algorithms to `spark.mllib` and can optionally contribute to `spark.ml`.
+See the `spark.ml` programming guide linked above for more details.
+
 # Dependencies
 
 MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/),
author	Joseph K. Bradley <joseph@databricks.com>	2014-12-04 17:00:06 +0800
committer	Xiangrui Meng <meng@databricks.com>	2014-12-04 17:00:06 +0800
commit	469a6e5f3bdd5593b3254bc916be8236e7c6cb74 (patch)
tree	fd9756fcaf83aca60724616dd9abaa55b7e5c6dd /docs/mllib-guide.md
parent	529439bd506949f272a2b6f099ea549b097428f3 (diff)
download	spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.tar.gz spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.tar.bz2 spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.zip