diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2016-04-28 16:20:00 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2016-04-28 16:20:00 -0700 |
commit | 4f4721a21cc9acc2b6f685bbfc8757d29563a775 (patch) | |
tree | 6cd62a33cb375e32ba72abfba71c2cf9b64df616 /pom.xml | |
parent | dae538a4d7c36191c1feb02ba87ffc624ab960dc (diff) | |
download | spark-4f4721a21cc9acc2b6f685bbfc8757d29563a775.tar.gz spark-4f4721a21cc9acc2b6f685bbfc8757d29563a775.tar.bz2 spark-4f4721a21cc9acc2b6f685bbfc8757d29563a775.zip |
[SPARK-14862][ML] Updated Classifiers to not require labelCol metadata
## What changes were proposed in this pull request?
Updated Classifier, DecisionTreeClassifier, RandomForestClassifier, GBTClassifier to not require input column metadata.
* They first check for metadata.
* If numClasses is not specified in metadata, they identify the largest label value (up to a limit).
This functionality is implemented in a new Classifier.getNumClasses method.
Also
* Updated Classifier.extractLabeledPoints to (a) check label values and (b) include a second version which takes a numClasses value for validity checking.
## How was this patch tested?
* Unit tests in ClassifierSuite for helper methods
* Unit tests for DecisionTreeClassifier, RandomForestClassifier, GBTClassifier with toy datasets lacking label metadata
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #12663 from jkbradley/trees-no-metadata.
Diffstat (limited to 'pom.xml')
0 files changed, 0 insertions, 0 deletions