aboutsummaryrefslogtreecommitdiff
path: root/mllib/src
diff options
context:
space:
mode:
authorZheng RuiFeng <ruifengz@foxmail.com>2017-03-21 08:45:59 -0700
committerXiao Li <gatorsmile@gmail.com>2017-03-21 08:45:59 -0700
commit63f077fbe50b4094340e9915db41d7dbdba52975 (patch)
tree3442fa7374aa58b648de5c5bb4c76a5e3a9769df /mllib/src
parent14865d7ff78db5cf9a3e8626204c8e7ed059c353 (diff)
downloadspark-63f077fbe50b4094340e9915db41d7dbdba52975.tar.gz
spark-63f077fbe50b4094340e9915db41d7dbdba52975.tar.bz2
spark-63f077fbe50b4094340e9915db41d7dbdba52975.zip
[SPARK-20041][DOC] Update docs for NaN handling in approxQuantile
## What changes were proposed in this pull request? Update docs for NaN handling in approxQuantile. ## How was this patch tested? existing tests. Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #17369 from zhengruifeng/doc_quantiles_nan.
Diffstat (limited to 'mllib/src')
-rw-r--r--mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala4
1 files changed, 2 insertions, 2 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
index 80c7f55e26..feceeba866 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala
@@ -93,8 +93,8 @@ private[feature] trait QuantileDiscretizerBase extends Params
* are too few distinct values of the input to create enough distinct quantiles.
*
* NaN handling:
- * NaN values will be removed from the column during `QuantileDiscretizer` fitting. This will
- * produce a `Bucketizer` model for making predictions. During the transformation,
+ * null and NaN values will be ignored from the column during `QuantileDiscretizer` fitting. This
+ * will produce a `Bucketizer` model for making predictions. During the transformation,
* `Bucketizer` will raise an error when it finds NaN values in the dataset, but the user can
* also choose to either keep or remove NaN values within the dataset by setting `handleInvalid`.
* If the user chooses to keep NaN values, they will be handled specially and placed into their own