aboutsummaryrefslogtreecommitdiff
path: root/dev
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2014-11-25 20:10:15 -0800
committerXiangrui Meng <meng@databricks.com>2014-11-25 20:10:15 -0800
commitc251fd7405db57d3ab2686c38712601fd8f13ccd (patch)
tree0496b7175e6081cc7f4f3278e1e44448217866f7 /dev
parent7eba0fbe456c451122d7a2353ff0beca00f15223 (diff)
downloadspark-c251fd7405db57d3ab2686c38712601fd8f13ccd.tar.gz
spark-c251fd7405db57d3ab2686c38712601fd8f13ccd.tar.bz2
spark-c251fd7405db57d3ab2686c38712601fd8f13ccd.zip
[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
Currently, the LogLoss used by GradientBoostedTrees has 2 issues: * the gradient (and therefore loss) does not match that used by Friedman (1999) * the error computation uses 0/1 accuracy, not log loss This PR updates LogLoss. It also adds some doc for boosting and forests. I tested it on sample data and made sure the log loss is monotonically decreasing with each boosting iteration. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Closes #3439 from jkbradley/gbt-loss-fix and squashes the following commits: cfec17e [Joseph K. Bradley] removed forgotten temp comments a27eb6d [Joseph K. Bradley] corrections to last log loss commit ed5da2c [Joseph K. Bradley] updated LogLoss (boosting) for numerical stability 5e52bff [Joseph K. Bradley] * Removed the 1/2 from SquaredError. This also required updating the test suite since it effectively doubles the gradient and loss. * Added doc for developers within RandomForest. * Small cleanup in test suite (generating data only once) e57897a [Joseph K. Bradley] Fixed LogLoss for GradientBoostedTrees, and updated doc for losses, forests, and boosting
Diffstat (limited to 'dev')
0 files changed, 0 insertions, 0 deletions