aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-decision-tree.md
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2014-05-06 20:07:22 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-05-06 20:07:22 -0700
commit25ad8f93012730115a8a1fac649fe3e842c045b3 (patch)
tree6bc0dfec7014289e39f4c5c9070ed121e00c4398 /docs/mllib-decision-tree.md
parenta000b5c3b0438c17e9973df4832c320210c29c27 (diff)
downloadspark-25ad8f93012730115a8a1fac649fe3e842c045b3.tar.gz
spark-25ad8f93012730115a8a1fac649fe3e842c045b3.tar.bz2
spark-25ad8f93012730115a8a1fac649fe3e842c045b3.zip
SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs
While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs. Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown. Author: Sean Owen <sowen@cloudera.com> Closes #653 from srowen/SPARK-1727 and squashes the following commits: 6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count 8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output) 99966a9 [Sean Owen] Update issue tracker URL in docs 23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak) 8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
Diffstat (limited to 'docs/mllib-decision-tree.md')
-rw-r--r--docs/mllib-decision-tree.md8
1 files changed, 4 insertions, 4 deletions
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index 0693766990..296277e58b 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -83,19 +83,19 @@ Section 9.2.4 in
[Elements of Statistical Machine Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) for
details). For example, for a binary classification problem with one categorical feature with three
categories A, B and C with corresponding proportion of label 1 as 0.2, 0.6 and 0.4, the categorical
-features are orded as A followed by C followed B or A, B, C. The two split candidates are A \| C, B
+features are ordered as A followed by C followed B or A, B, C. The two split candidates are A \| C, B
and A , B \| C where \| denotes the split.
### Stopping rule
The recursive tree construction is stopped at a node when one of the two conditions is met:
-1. The node depth is equal to the `maxDepth` training parammeter
+1. The node depth is equal to the `maxDepth` training parameter
2. No split candidate leads to an information gain at the node.
### Practical limitations
-1. The tree implementation stores an Array[Double] of size *O(#features \* #splits \* 2^maxDepth)*
+1. The tree implementation stores an `Array[Double]` of size *O(#features \* #splits \* 2^maxDepth)*
in memory for aggregating histograms over partitions. The current implementation might not scale
to very deep trees since the memory requirement grows exponentially with tree depth.
2. The implemented algorithm reads both sparse and dense data. However, it is not optimized for
@@ -178,7 +178,7 @@ val valuesAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
-val MSE = valuesAndPreds.map{ case(v, p) => math.pow((v - p), 2)}.reduce(_ + _)/valuesAndPreds.count
+val MSE = valuesAndPreds.map{ case(v, p) => math.pow((v - p), 2)}.mean()
println("training Mean Squared Error = " + MSE)
{% endhighlight %}
</div>