[SPARK-12494][MLLIB] Array out of bound Exception in KMeans Yarn Mode

## What changes were proposed in this pull request? Better error message with k-means init can't be enough samples from input (because it is perhaps empty) ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #11979 from srowen/SPARK-12494.
author: Sean Owen <sowen@cloudera.com> 2016-03-28 12:01:33 +0100
committer: Sean Owen <sowen@cloudera.com> 2016-03-28 12:01:33 +0100
commit: 7b841540180e8d1403d6c95b02e93f129267b34f (patch)
tree: 95b5105e64bc651b14bd6129201fee6ba111a40d
parent: aac13fb48c8aa7d6816ea46c2e40154913477717 (diff)
download: spark-7b841540180e8d1403d6c95b02e93f129267b34f.tar.gz
spark-7b841540180e8d1403d6c95b02e93f129267b34f.tar.bz2
spark-7b841540180e8d1403d6c95b02e93f129267b34f.zip
1 files changed, 2 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
index a7beb81980..37a21cd879 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
@@ -390,6 +390,8 @@ class KMeans private (
     // Initialize each run's first center to a random point.
     val seed = new XORShiftRandom(this.seed).nextInt()
     val sample = data.takeSample(true, runs, seed).toSeq
+    // Could be empty if data is empty; fail with a better message early:
+    require(sample.size >= runs, s"Required $runs samples but got ${sample.size} from $data")
     val newCenters = Array.tabulate(runs)(r => ArrayBuffer(sample(r).toDense))
 
     /** Merges new centers to centers. */
author	Sean Owen <sowen@cloudera.com>	2016-03-28 12:01:33 +0100
committer	Sean Owen <sowen@cloudera.com>	2016-03-28 12:01:33 +0100
commit	7b841540180e8d1403d6c95b02e93f129267b34f (patch)
tree	95b5105e64bc651b14bd6129201fee6ba111a40d
parent	aac13fb48c8aa7d6816ea46c2e40154913477717 (diff)
download	spark-7b841540180e8d1403d6c95b02e93f129267b34f.tar.gz spark-7b841540180e8d1403d6c95b02e93f129267b34f.tar.bz2 spark-7b841540180e8d1403d6c95b02e93f129267b34f.zip