diff options
author | johnnywalleye <jsondag@gmail.com> | 2014-07-08 19:17:26 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-07-08 19:17:26 -0700 |
commit | 1114207cc8e4ef94cb97bbd5a2ef3ae4d51f73fa (patch) | |
tree | efa02266138365df88304a26371fc39f8bede199 /docs/js | |
parent | ac9cdc116e1c5fcb291a4ff168cac002a8058f05 (diff) | |
download | spark-1114207cc8e4ef94cb97bbd5a2ef3ae4d51f73fa.tar.gz spark-1114207cc8e4ef94cb97bbd5a2ef3ae4d51f73fa.tar.bz2 spark-1114207cc8e4ef94cb97bbd5a2ef3ae4d51f73fa.zip |
[SPARK-2152][MLlib] fix bin offset in DecisionTree node aggregations (also resolves SPARK-2160)
Hi, this pull fixes (what I believe to be) a bug in DecisionTree.scala.
In the extractLeftRightNodeAggregates function, the first set of rightNodeAgg values for Regression are set in line 792 as follows:
rightNodeAgg(featureIndex)(2 * (numBins - 2))
= binData(shift + (2 * numBins - 1)))
Then there is a loop that sets the rest of the values, as in line 809:
rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) =
binData(shift + (2 *(numBins - 2 - splitIndex))) +
rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex))
But since splitIndex starts at 1, this ends up skipping a set of binData values.
The changes here address this issue, for both the Regression and Classification cases.
Author: johnnywalleye <jsondag@gmail.com>
Closes #1316 from johnnywalleye/master and squashes the following commits:
73809da [johnnywalleye] fix bin offset in DecisionTree node aggregations
Diffstat (limited to 'docs/js')
0 files changed, 0 insertions, 0 deletions