aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/mllib-guide.md')
-rw-r--r--docs/mllib-guide.md29
1 files changed, 28 insertions, 1 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 94fc98ce4f..dcb6819f46 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -16,8 +16,9 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv
* random data generation
* [Classification and regression](mllib-classification-regression.html)
* [linear models (SVMs, logistic regression, linear regression)](mllib-linear-methods.html)
- * [decision trees](mllib-decision-tree.html)
* [naive Bayes](mllib-naive-bayes.html)
+ * [decision trees](mllib-decision-tree.html)
+ * [ensembles of trees](mllib-ensembles.html) (Random Forests and Gradient-Boosted Trees)
* [Collaborative filtering](mllib-collaborative-filtering.html)
* alternating least squares (ALS)
* [Clustering](mllib-clustering.html)
@@ -60,6 +61,32 @@ To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4
# Migration Guide
+## From 1.1 to 1.2
+
+The only API changes in MLlib v1.2 are in
+[`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree),
+which continues to be an experimental API in MLlib 1.2:
+
+1. *(Breaking change)* The Scala API for classification takes a named argument specifying the number
+of classes. In MLlib v1.1, this argument was called `numClasses` in Python and
+`numClassesForClassification` in Scala. In MLlib v1.2, the names are both set to `numClasses`.
+This `numClasses` parameter is specified either via
+[`Strategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.Strategy)
+or via [`DecisionTree`](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree)
+static `trainClassifier` and `trainRegressor` methods.
+
+2. *(Breaking change)* The API for
+[`Node`](api/scala/index.html#org.apache.spark.mllib.tree.model.Node) has changed.
+This should generally not affect user code, unless the user manually constructs decision trees
+(instead of using the `trainClassifier` or `trainRegressor` methods).
+The tree `Node` now includes more information, including the probability of the predicted label
+(for classification).
+
+3. Printing methods' output has changed. The `toString` (Scala/Java) and `__repr__` (Python) methods used to print the full model; they now print a summary. For the full model, use `toDebugString`.
+
+Examples in the Spark distribution and examples in the
+[Decision Trees Guide](mllib-decision-tree.html#examples) have been updated accordingly.
+
## From 1.0 to 1.1
The only API changes in MLlib v1.1 are in