aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-guide.md
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2014-12-04 17:00:06 +0800
committerXiangrui Meng <meng@databricks.com>2014-12-04 17:00:06 +0800
commit469a6e5f3bdd5593b3254bc916be8236e7c6cb74 (patch)
treefd9756fcaf83aca60724616dd9abaa55b7e5c6dd /docs/mllib-guide.md
parent529439bd506949f272a2b6f099ea549b097428f3 (diff)
downloadspark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.tar.gz
spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.tar.bz2
spark-469a6e5f3bdd5593b3254bc916be8236e7c6cb74.zip
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes
Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
Diffstat (limited to 'docs/mllib-guide.md')
-rw-r--r--docs/mllib-guide.md13
1 files changed, 12 insertions, 1 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index dcb6819f46..efd7dda310 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -1,6 +1,6 @@
---
layout: global
-title: Machine Learning Library (MLlib)
+title: Machine Learning Library (MLlib) Programming Guide
---
MLlib is Spark's scalable machine learning library consisting of common learning algorithms and utilities,
@@ -35,6 +35,17 @@ MLlib is under active development.
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
and the migration guide below will explain all changes between releases.
+# spark.ml: The New ML Package
+
+Spark 1.2 includes a new machine learning package called `spark.ml`, currently an alpha component but potentially a successor to `spark.mllib`. The `spark.ml` package aims to replace the old APIs with a cleaner, more uniform set of APIs which will help users create full machine learning pipelines.
+
+See the **[spark.ml programming guide](ml-guide.html)** for more information on this package.
+
+Users can use algorithms from either of the two packages, but APIs may differ. Currently, `spark.ml` offers a subset of the algorithms from `spark.mllib`.
+
+Developers should contribute new algorithms to `spark.mllib` and can optionally contribute to `spark.ml`.
+See the `spark.ml` programming guide linked above for more details.
+
# Dependencies
MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/),