aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2015-07-30 17:26:18 -0700
committerJoseph K. Bradley <joseph@databricks.com>2015-07-30 17:26:18 -0700
commit65fa4181c35135080870c1e4c1f904ada3a8cf59 (patch)
treedf7cef6db7095640e72e0e3e46e3172ef3dadce9 /mllib
parent351eda0e2fd47c183c4298469970032097ad07a0 (diff)
downloadspark-65fa4181c35135080870c1e4c1f904ada3a8cf59.tar.gz
spark-65fa4181c35135080870c1e4c1f904ada3a8cf59.tar.bz2
spark-65fa4181c35135080870c1e4c1f904ada3a8cf59.zip
[SPARK-9077] [MLLIB] Improve error message for decision trees when numExamples < maxCategoriesPerFeature
Improve error message when number of examples is less than arity of high-arity categorical feature CC jkbradley is this about what you had in mind? I know it's a starter, but was on my list to close out in the short term. Author: Sean Owen <sowen@cloudera.com> Closes #7800 from srowen/SPARK-9077 and squashes the following commits: b8f6cdb [Sean Owen] Improve error message when number of examples is less than arity of high-arity categorical feature
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala8
1 files changed, 6 insertions, 2 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala b/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
index 380291ac22..9fe264656e 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
@@ -128,9 +128,13 @@ private[spark] object DecisionTreeMetadata extends Logging {
// based on the number of training examples.
if (strategy.categoricalFeaturesInfo.nonEmpty) {
val maxCategoriesPerFeature = strategy.categoricalFeaturesInfo.values.max
+ val maxCategory =
+ strategy.categoricalFeaturesInfo.find(_._2 == maxCategoriesPerFeature).get._1
require(maxCategoriesPerFeature <= maxPossibleBins,
- s"DecisionTree requires maxBins (= $maxPossibleBins) >= max categories " +
- s"in categorical features (= $maxCategoriesPerFeature)")
+ s"DecisionTree requires maxBins (= $maxPossibleBins) to be at least as large as the " +
+ s"number of values in each categorical feature, but categorical feature $maxCategory " +
+ s"has $maxCategoriesPerFeature values. Considering remove this and other categorical " +
+ "features with a large number of values, or add more training examples.")
}
val unorderedFeatures = new mutable.HashSet[Int]()