aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-classification-regression.md
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2016-07-15 13:38:23 -0700
committerJoseph K. Bradley <joseph@databricks.com>2016-07-15 13:38:23 -0700
commit5ffd5d3838da40ad408a6f40071fe6f4dcacf2a1 (patch)
tree4d2c6476c38f84ef34eef20077f8e491b172681d /docs/ml-classification-regression.md
parent71ad945bbbdd154eae852cd7f841e98f7a83e8d4 (diff)
downloadspark-5ffd5d3838da40ad408a6f40071fe6f4dcacf2a1.tar.gz
spark-5ffd5d3838da40ad408a6f40071fe6f4dcacf2a1.tar.bz2
spark-5ffd5d3838da40ad408a6f40071fe6f4dcacf2a1.zip
[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide
## What changes were proposed in this pull request? Made DataFrame-based API primary * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API * **Reviewers: please check this carefully** * (minor) Titles for DF API no longer include "- spark.ml" suffix. Titles for RDD API have "- RDD-based API" suffix * Moved migration guide to ml-guide from mllib-guide * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides * **Reviewers**: I did not change any of the content of the migration guides. Reorganized DataFrame-based guide: * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc. * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html * **Reviewers**: I did not change the content of these guides, except some intro text. * Sidebar remains the same, but with pipeline and tuning sections added Other: * ml-classification-regression.html: Moved text about linear methods to new section in page ## How was this patch tested? Generated docs locally Author: Joseph K. Bradley <joseph@databricks.com> Closes #14213 from jkbradley/ml-guide-2.0.
Diffstat (limited to 'docs/ml-classification-regression.md')
-rw-r--r--docs/ml-classification-regression.md60
1 files changed, 32 insertions, 28 deletions
diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md
index 3d6106b532..7c2437eacd 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -1,7 +1,7 @@
---
layout: global
-title: Classification and regression - spark.ml
-displayTitle: Classification and regression - spark.ml
+title: Classification and regression
+displayTitle: Classification and regression
---
@@ -22,37 +22,14 @@ displayTitle: Classification and regression - spark.ml
\newcommand{\zero}{\mathbf{0}}
\]`
+This page covers algorithms for Classification and Regression. It also includes sections
+discussing specific classes of algorithms, such as linear methods, trees, and ensembles.
+
**Table of Contents**
* This will become a table of contents (this text will be scraped).
{:toc}
-In `spark.ml`, we implement popular linear methods such as logistic
-regression and linear least squares with $L_1$ or $L_2$ regularization.
-Refer to [the linear methods in mllib](mllib-linear-methods.html) for
-details about implementation and tuning. We also include a DataFrame API for [Elastic
-net](http://en.wikipedia.org/wiki/Elastic_net_regularization), a hybrid
-of $L_1$ and $L_2$ regularization proposed in [Zou et al, Regularization
-and variable selection via the elastic
-net](http://users.stat.umn.edu/~zouxx019/Papers/elasticnet.pdf).
-Mathematically, it is defined as a convex combination of the $L_1$ and
-the $L_2$ regularization terms:
-`\[
-\alpha \left( \lambda \|\wv\|_1 \right) + (1-\alpha) \left( \frac{\lambda}{2}\|\wv\|_2^2 \right) , \alpha \in [0, 1], \lambda \geq 0
-\]`
-By setting $\alpha$ properly, elastic net contains both $L_1$ and $L_2$
-regularization as special cases. For example, if a [linear
-regression](https://en.wikipedia.org/wiki/Linear_regression) model is
-trained with the elastic net parameter $\alpha$ set to $1$, it is
-equivalent to a
-[Lasso](http://en.wikipedia.org/wiki/Least_squares#Lasso_method) model.
-On the other hand, if $\alpha$ is set to $0$, the trained model reduces
-to a [ridge
-regression](http://en.wikipedia.org/wiki/Tikhonov_regularization) model.
-We implement Pipelines API for both linear regression and logistic
-regression with elastic net regularization.
-
-
# Classification
## Logistic regression
@@ -760,7 +737,34 @@ Refer to the [`IsotonicRegression` Python docs](api/python/pyspark.ml.html#pyspa
</div>
</div>
+# Linear methods
+
+We implement popular linear methods such as logistic
+regression and linear least squares with $L_1$ or $L_2$ regularization.
+Refer to [the linear methods guide for the RDD-based API](mllib-linear-methods.html) for
+details about implementation and tuning; this information is still relevant.
+We also include a DataFrame API for [Elastic
+net](http://en.wikipedia.org/wiki/Elastic_net_regularization), a hybrid
+of $L_1$ and $L_2$ regularization proposed in [Zou et al, Regularization
+and variable selection via the elastic
+net](http://users.stat.umn.edu/~zouxx019/Papers/elasticnet.pdf).
+Mathematically, it is defined as a convex combination of the $L_1$ and
+the $L_2$ regularization terms:
+`\[
+\alpha \left( \lambda \|\wv\|_1 \right) + (1-\alpha) \left( \frac{\lambda}{2}\|\wv\|_2^2 \right) , \alpha \in [0, 1], \lambda \geq 0
+\]`
+By setting $\alpha$ properly, elastic net contains both $L_1$ and $L_2$
+regularization as special cases. For example, if a [linear
+regression](https://en.wikipedia.org/wiki/Linear_regression) model is
+trained with the elastic net parameter $\alpha$ set to $1$, it is
+equivalent to a
+[Lasso](http://en.wikipedia.org/wiki/Least_squares#Lasso_method) model.
+On the other hand, if $\alpha$ is set to $0$, the trained model reduces
+to a [ridge
+regression](http://en.wikipedia.org/wiki/Tikhonov_regularization) model.
+We implement Pipelines API for both linear regression and logistic
+regression with elastic net regularization.
# Decision trees