[SPARK-13646][MLLIB] QuantileDiscretizer counts dataset twice in get…

## What changes were proposed in this pull request? It avoids counting the dataframe twice. Author: Abou Haydar Elias <abouhaydar.elias@gmail.com> Author: Elie A <abouhaydar.elias@gmail.com> Closes #11491 from eliasah/quantile-discretizer-patch.
author: Abou Haydar Elias <abouhaydar.elias@gmail.com> 2016-03-04 10:01:52 +0000
committer: Sean Owen <sowen@cloudera.com> 2016-03-04 10:01:52 +0000
commit: 27e88faa058c1364d0e99fffc0c5cb64ef817bd3 (patch)
tree: 98e568e848d727b6a2640199bf09998660e829bf
parent: dd83c209f1692a2e5afb72fa7a2d039fd1e682c8 (diff)
download: spark-27e88faa058c1364d0e99fffc0c5cb64ef817bd3.tar.gz
spark-27e88faa058c1364d0e99fffc0c5cb64ef817bd3.tar.bz2
spark-27e88faa058c1364d0e99fffc0c5cb64ef817bd3.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
index d75b3ef420..18896fcc4d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
@@ -118,7 +118,7 @@ object QuantileDiscretizer extends DefaultParamsReadable[QuantileDiscretizer] wi
     require(totalSamples > 0,
       "QuantileDiscretizer requires non-empty input dataset but was given an empty input.")
     val requiredSamples = math.max(numBins * numBins, minSamplesRequired)
-    val fraction = math.min(requiredSamples.toDouble / dataset.count(), 1.0)
+    val fraction = math.min(requiredSamples.toDouble / totalSamples, 1.0)
     dataset.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()).collect()
   }
author	Abou Haydar Elias <abouhaydar.elias@gmail.com>	2016-03-04 10:01:52 +0000
committer	Sean Owen <sowen@cloudera.com>	2016-03-04 10:01:52 +0000
commit	27e88faa058c1364d0e99fffc0c5cb64ef817bd3 (patch)
tree	98e568e848d727b6a2640199bf09998660e829bf
parent	dd83c209f1692a2e5afb72fa7a2d039fd1e682c8 (diff)
download	spark-27e88faa058c1364d0e99fffc0c5cb64ef817bd3.tar.gz spark-27e88faa058c1364d0e99fffc0c5cb64ef817bd3.tar.bz2 spark-27e88faa058c1364d0e99fffc0c5cb64ef817bd3.zip