diff options
author | Matt Forbes <matt@tellapart.com> | 2014-08-18 21:43:32 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-08-18 21:43:32 -0700 |
commit | cd0720ca77894d481fb73a8b5bb517013843cb1e (patch) | |
tree | ce4eb8f4442ba5e90f3447396acfe64143f31d0d /docs | |
parent | 82577339dd58b5811eab5d10667775e61e37ff51 (diff) | |
download | spark-cd0720ca77894d481fb73a8b5bb517013843cb1e.tar.gz spark-cd0720ca77894d481fb73a8b5bb517013843cb1e.tar.bz2 spark-cd0720ca77894d481fb73a8b5bb517013843cb1e.zip |
Fix typo in decision tree docs
Candidate splits were inconsistent with the example.
Author: Matt Forbes <matt@tellapart.com>
Closes #1837 from emef/tree-doc and squashes the following commits:
3be14a1 [Matt Forbes] Fix typo in decision tree docs
Diffstat (limited to 'docs')
-rw-r--r-- | docs/mllib-decision-tree.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md index 9cbd880897..c01a92a9a1 100644 --- a/docs/mllib-decision-tree.md +++ b/docs/mllib-decision-tree.md @@ -84,8 +84,8 @@ Section 9.2.4 in [Elements of Statistical Machine Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) for details). For example, for a binary classification problem with one categorical feature with three categories A, B and C with corresponding proportion of label 1 as 0.2, 0.6 and 0.4, the categorical -features are ordered as A followed by C followed B or A, B, C. The two split candidates are A \| C, B -and A , B \| C where \| denotes the split. A similar heuristic is used for multiclass classification +features are ordered as A followed by C followed B or A, C, B. The two split candidates are A \| C, B +and A , C \| B where \| denotes the split. A similar heuristic is used for multiclass classification when `$2^(M-1)-1$` is greater than the number of bins -- the impurity for each categorical feature value is used for ordering. |