aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-decision-tree.md
diff options
context:
space:
mode:
authorBryan Cutler <cutlerb@gmail.com>2016-02-26 08:30:32 -0800
committerXiangrui Meng <meng@databricks.com>2016-02-26 08:30:32 -0800
commitb33261f91387904c5aaccae40f86922c92a4e09a (patch)
treeabae986f0bd829276d4b320f8242275a22609212 /docs/mllib-decision-tree.md
parent99dfcedbfd4c83c7b6a343456f03e8c6e29968c5 (diff)
downloadspark-b33261f91387904c5aaccae40f86922c92a4e09a.tar.gz
spark-b33261f91387904c5aaccae40f86922c92a4e09a.tar.bz2
spark-b33261f91387904c5aaccae40f86922c92a4e09a.zip
[SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format
Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the tree module. closes #10601 Author: Bryan Cutler <cutlerb@gmail.com> Author: vijaykiran <mail@vijaykiran.com> Closes #11353 from BryanCutler/param-desc-consistent-tree-SPARK-12634.
Diffstat (limited to 'docs/mllib-decision-tree.md')
-rw-r--r--docs/mllib-decision-tree.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index a8612b6c84..9af48357b3 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -121,12 +121,12 @@ The parameters are listed below roughly in order of descending importance. New
These parameters describe the problem you want to solve and your dataset.
They should be specified and do not require tuning.
-* **`algo`**: `Classification` or `Regression`
+* **`algo`**: Type of decision tree, either `Classification` or `Regression`.
-* **`numClasses`**: Number of classes (for `Classification` only)
+* **`numClasses`**: Number of classes (for `Classification` only).
* **`categoricalFeaturesInfo`**: Specifies which features are categorical and how many categorical values each of those features can take. This is given as a map from feature indices to feature arity (number of categories). Any features not in this map are treated as continuous.
- * E.g., `Map(0 -> 2, 4 -> 10)` specifies that feature `0` is binary (taking values `0` or `1`) and that feature `4` has 10 categories (values `{0, 1, ..., 9}`). Note that feature indices are 0-based: features `0` and `4` are the 1st and 5th elements of an instance's feature vector.
+ * For example, `Map(0 -> 2, 4 -> 10)` specifies that feature `0` is binary (taking values `0` or `1`) and that feature `4` has 10 categories (values `{0, 1, ..., 9}`). Note that feature indices are 0-based: features `0` and `4` are the 1st and 5th elements of an instance's feature vector.
* Note that you do not have to specify `categoricalFeaturesInfo`. The algorithm will still run and may get reasonable results. However, performance should be better if categorical features are properly designated.
### Stopping criteria