diff options
author | Xiangrui Meng <meng@databricks.com> | 2015-08-29 23:26:23 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-08-29 23:26:23 -0700 |
commit | 905fbe498bdd29116468628e6a2a553c1fd57165 (patch) | |
tree | 9a1859c344848b7427ee2113114e78a884c083b8 /docs/mllib-guide.md | |
parent | 13f5f8ec97c6886346641b73bd99004e0d70836c (diff) | |
download | spark-905fbe498bdd29116468628e6a2a553c1fd57165.tar.gz spark-905fbe498bdd29116468628e6a2a553c1fd57165.tar.bz2 spark-905fbe498bdd29116468628e6a2a553c1fd57165.zip |
[SPARK-10348] [MLLIB] updates ml-guide
* replace `ML Dataset` by `DataFrame` to unify the abstraction
* ML algorithms -> pipeline components to describe the main concept
* remove Scala API doc links from the main guide
* `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide
* modified lines break at 100 chars or periods
jkbradley feynmanliang
Author: Xiangrui Meng <meng@databricks.com>
Closes #8517 from mengxr/SPARK-10348.
Diffstat (limited to 'docs/mllib-guide.md')
-rw-r--r-- | docs/mllib-guide.md | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 876dcfd40e..257f7cc760 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -14,9 +14,9 @@ primitives and higher-level pipeline APIs. It divides into two packages: * [`spark.mllib`](mllib-guide.html#mllib-types-algorithms-and-utilities) contains the original API - built on top of RDDs. + built on top of [RDDs](programming-guide.html#resilient-distributed-datasets-rdds). * [`spark.ml`](mllib-guide.html#sparkml-high-level-apis-for-ml-pipelines) provides higher-level API - built on top of DataFrames for constructing ML pipelines. + built on top of [DataFrames](sql-programming-guide.html#dataframes) for constructing ML pipelines. Using `spark.ml` is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting `spark.mllib` along with the development of `spark.ml`. @@ -57,19 +57,19 @@ We list major functionality from both below, with links to detailed guides. * [FP-growth](mllib-frequent-pattern-mining.html#fp-growth) * [association rules](mllib-frequent-pattern-mining.html#association-rules) * [PrefixSpan](mllib-frequent-pattern-mining.html#prefix-span) -* [Evaluation Metrics](mllib-evaluation-metrics.html) +* [Evaluation metrics](mllib-evaluation-metrics.html) +* [PMML model export](mllib-pmml-model-export.html) * [Optimization (developer)](mllib-optimization.html) * [stochastic gradient descent](mllib-optimization.html#stochastic-gradient-descent-sgd) * [limited-memory BFGS (L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs) -* [PMML model export](mllib-pmml-model-export.html) # spark.ml: high-level APIs for ML pipelines **[spark.ml programming guide](ml-guide.html)** provides an overview of the Pipelines API and major concepts. It also contains sections on using algorithms within the Pipelines API, for example: -* [Feature Extraction, Transformation, and Selection](ml-features.html) -* [Decision Trees for Classification and Regression](ml-decision-tree.html) +* [Feature extraction, transformation, and selection](ml-features.html) +* [Decision trees for classification and regression](ml-decision-tree.html) * [Ensembles](ml-ensembles.html) * [Linear methods with elastic net regularization](ml-linear-methods.html) * [Multilayer perceptron classifier](ml-ann.html) |