aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-decision-tree.md
diff options
context:
space:
mode:
authorMatt Forbes <matt@tellapart.com>2014-08-18 21:43:32 -0700
committerXiangrui Meng <meng@databricks.com>2014-08-18 21:43:32 -0700
commitcd0720ca77894d481fb73a8b5bb517013843cb1e (patch)
treece4eb8f4442ba5e90f3447396acfe64143f31d0d /docs/mllib-decision-tree.md
parent82577339dd58b5811eab5d10667775e61e37ff51 (diff)
downloadspark-cd0720ca77894d481fb73a8b5bb517013843cb1e.tar.gz
spark-cd0720ca77894d481fb73a8b5bb517013843cb1e.tar.bz2
spark-cd0720ca77894d481fb73a8b5bb517013843cb1e.zip
Fix typo in decision tree docs
Candidate splits were inconsistent with the example. Author: Matt Forbes <matt@tellapart.com> Closes #1837 from emef/tree-doc and squashes the following commits: 3be14a1 [Matt Forbes] Fix typo in decision tree docs
Diffstat (limited to 'docs/mllib-decision-tree.md')
-rw-r--r--docs/mllib-decision-tree.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index 9cbd880897..c01a92a9a1 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -84,8 +84,8 @@ Section 9.2.4 in
[Elements of Statistical Machine Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) for
details). For example, for a binary classification problem with one categorical feature with three
categories A, B and C with corresponding proportion of label 1 as 0.2, 0.6 and 0.4, the categorical
-features are ordered as A followed by C followed B or A, B, C. The two split candidates are A \| C, B
-and A , B \| C where \| denotes the split. A similar heuristic is used for multiclass classification
+features are ordered as A followed by C followed B or A, C, B. The two split candidates are A \| C, B
+and A , C \| B where \| denotes the split. A similar heuristic is used for multiclass classification
when `$2^(M-1)-1$` is greater than the number of bins -- the impurity for each categorical feature value
is used for ordering.