diff options
author | VinceShieh <vincent.xie@intel.com> | 2016-08-24 10:16:58 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-08-24 10:16:58 +0100 |
commit | 92c0eaf348b42b3479610da0be761013f9d81c54 (patch) | |
tree | 87f25b1e86cfa8b469f83c0575c792fd4c4f4a48 /docs | |
parent | 673a80d2230602c9e6573a23e35fb0f6b832bfca (diff) | |
download | spark-92c0eaf348b42b3479610da0be761013f9d81c54.tar.gz spark-92c0eaf348b42b3479610da0be761013f9d81c54.tar.bz2 spark-92c0eaf348b42b3479610da0be761013f9d81c54.zip |
[SPARK-17086][ML] Fix InvalidArgumentException issue in QuantileDiscretizer when some quantiles are duplicated
## What changes were proposed in this pull request?
In cases when QuantileDiscretizerSuite is called upon a numeric array with duplicated elements, we will take the unique elements generated from approxQuantiles as input for Bucketizer.
## How was this patch tested?
An unit test is added in QuantileDiscretizerSuite
QuantileDiscretizer.fit will throw an illegal exception when calling setSplits on a list of splits
with duplicated elements. Bucketizer.setSplits should only accept either a numeric vector of two
or more unique cut points, although that may produce less number of buckets than requested.
Signed-off-by: VinceShieh <vincent.xieintel.com>
Author: VinceShieh <vincent.xie@intel.com>
Closes #14747 from VinceShieh/SPARK-17086.
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions