aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-guide.md
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2016-12-09 17:34:52 -0800
committerXiangrui Meng <meng@databricks.com>2016-12-09 17:34:52 -0800
commitd2493a203e852adf63dde4e1fc993e8d11efec3d (patch)
treeee9b029aa5be79d39d92c621f545778b09d36491 /docs/ml-guide.md
parentcf33a86285629abe72c1acf235b8bfa6057220a8 (diff)
downloadspark-d2493a203e852adf63dde4e1fc993e8d11efec3d.tar.gz
spark-d2493a203e852adf63dde4e1fc993e8d11efec3d.tar.bz2
spark-d2493a203e852adf63dde4e1fc993e8d11efec3d.zip
[SPARK-18812][MLLIB] explain "Spark ML"
## What changes were proposed in this pull request? There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion. I check the [Spark FAQ page](http://spark.apache.org/faq.html), which seems too high-level for the content here. So I added it to the MLlib user guide instead. cc: mateiz Author: Xiangrui Meng <meng@databricks.com> Closes #16241 from mengxr/SPARK-18812.
Diffstat (limited to 'docs/ml-guide.md')
-rw-r--r--docs/ml-guide.md12
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index ddf81be177..971761961b 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -35,6 +35,18 @@ The primary Machine Learning API for Spark is now the [DataFrame](sql-programmin
* The DataFrame-based API for MLlib provides a uniform API across ML algorithms and across multiple languages.
* DataFrames facilitate practical ML Pipelines, particularly feature transformations. See the [Pipelines guide](ml-pipeline.html) for details.
+*What is "Spark ML"?*
+
+* "Spark ML" is not an official name but occasionally used to refer to the MLlib DataFrame-based API.
+ This is majorly due to the `org.apache.spark.ml` Scala package name used by the DataFrame-based API,
+ and the "Spark ML Pipelines" term we used initially to emphasize the pipeline concept.
+
+*Is MLlib deprecated?*
+
+* No. MLlib includes both the RDD-based API and the DataFrame-based API.
+ The RDD-based API is now in maintenance mode.
+ But neither API is deprecated, nor MLlib as a whole.
+
# Dependencies
MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), which depends on