diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2014-11-25 20:10:15 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-11-25 20:10:25 -0800 |
commit | 6880b467f66a4906161cbc343e70d975056a4f5f (patch) | |
tree | b46fe8b4e8f34c01819e2869d4e9f6e84d07819d /dev | |
parent | a48ea3cef22687694a4471705fb707bd1e8fa592 (diff) | |
download | spark-6880b467f66a4906161cbc343e70d975056a4f5f.tar.gz spark-6880b467f66a4906161cbc343e70d975056a4f5f.tar.bz2 spark-6880b467f66a4906161cbc343e70d975056a4f5f.zip |
[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
Currently, the LogLoss used by GradientBoostedTrees has 2 issues:
* the gradient (and therefore loss) does not match that used by Friedman (1999)
* the error computation uses 0/1 accuracy, not log loss
This PR updates LogLoss.
It also adds some doc for boosting and forests.
I tested it on sample data and made sure the log loss is monotonically decreasing with each boosting iteration.
CC: mengxr manishamde codedeft
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #3439 from jkbradley/gbt-loss-fix and squashes the following commits:
cfec17e [Joseph K. Bradley] removed forgotten temp comments
a27eb6d [Joseph K. Bradley] corrections to last log loss commit
ed5da2c [Joseph K. Bradley] updated LogLoss (boosting) for numerical stability
5e52bff [Joseph K. Bradley] * Removed the 1/2 from SquaredError. This also required updating the test suite since it effectively doubles the gradient and loss. * Added doc for developers within RandomForest. * Small cleanup in test suite (generating data only once)
e57897a [Joseph K. Bradley] Fixed LogLoss for GradientBoostedTrees, and updated doc for losses, forests, and boosting
(cherry picked from commit c251fd7405db57d3ab2686c38712601fd8f13ccd)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Diffstat (limited to 'dev')
0 files changed, 0 insertions, 0 deletions