diff options
author | z001qdp <Nicholas.Eggert@target.com> | 2016-07-15 12:30:22 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-07-15 12:30:22 +0100 |
commit | 71ad945bbbdd154eae852cd7f841e98f7a83e8d4 (patch) | |
tree | 9d6d5b62dba642b46978a729a968e0057faecaf8 /docs/mllib-pmml-model-export.md | |
parent | 1832423827fd518853b63f91c321e4568a39107d (diff) | |
download | spark-71ad945bbbdd154eae852cd7f841e98f7a83e8d4.tar.gz spark-71ad945bbbdd154eae852cd7f841e98f7a83e8d4.tar.bz2 spark-71ad945bbbdd154eae852cd7f841e98f7a83e8d4.zip |
[SPARK-16426][MLLIB] Fix bug that caused NaNs in IsotonicRegression
## What changes were proposed in this pull request?
Fixed a bug that caused `NaN`s in `IsotonicRegression`. The problem occurs when training rows with the same feature value but different labels end up on different partitions. This patch changes a `sortBy` call to a `partitionBy(RangePartitioner)` followed by a `mapPartitions(sortBy)` in order to ensure that all rows with the same feature value end up on the same partition.
## How was this patch tested?
Added a unit test.
Author: z001qdp <Nicholas.Eggert@target.com>
Closes #14140 from neggert/SPARK-16426-isotonic-nan.
Diffstat (limited to 'docs/mllib-pmml-model-export.md')
0 files changed, 0 insertions, 0 deletions