aboutsummaryrefslogtreecommitdiff
path: root/dev
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2014-11-25 20:10:15 -0800
committerXiangrui Meng <meng@databricks.com>2014-11-25 20:10:25 -0800
commit6880b467f66a4906161cbc343e70d975056a4f5f (patch)
treeb46fe8b4e8f34c01819e2869d4e9f6e84d07819d /dev
parenta48ea3cef22687694a4471705fb707bd1e8fa592 (diff)
downloadspark-6880b467f66a4906161cbc343e70d975056a4f5f.tar.gz
spark-6880b467f66a4906161cbc343e70d975056a4f5f.tar.bz2
spark-6880b467f66a4906161cbc343e70d975056a4f5f.zip
[SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates
Currently, the LogLoss used by GradientBoostedTrees has 2 issues: * the gradient (and therefore loss) does not match that used by Friedman (1999) * the error computation uses 0/1 accuracy, not log loss This PR updates LogLoss. It also adds some doc for boosting and forests. I tested it on sample data and made sure the log loss is monotonically decreasing with each boosting iteration. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Closes #3439 from jkbradley/gbt-loss-fix and squashes the following commits: cfec17e [Joseph K. Bradley] removed forgotten temp comments a27eb6d [Joseph K. Bradley] corrections to last log loss commit ed5da2c [Joseph K. Bradley] updated LogLoss (boosting) for numerical stability 5e52bff [Joseph K. Bradley] * Removed the 1/2 from SquaredError. This also required updating the test suite since it effectively doubles the gradient and loss. * Added doc for developers within RandomForest. * Small cleanup in test suite (generating data only once) e57897a [Joseph K. Bradley] Fixed LogLoss for GradientBoostedTrees, and updated doc for losses, forests, and boosting (cherry picked from commit c251fd7405db57d3ab2686c38712601fd8f13ccd) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Diffstat (limited to 'dev')
0 files changed, 0 insertions, 0 deletions