diff options
author | Menglong TAN <tanmenglong@renrenche.com> | 2017-03-14 07:45:42 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2017-03-14 07:45:42 -0700 |
commit | 85941ecf28362f35718ebcd3a22dbb17adb49154 (patch) | |
tree | 37bdc1a9558c4c07a26991f115132c13cdbecf17 /sql/core/src/test/resources/test-data/cars.tsv | |
parent | d4a637cd46b6dd5cc71ea17a55c4a26186e592c7 (diff) | |
download | spark-85941ecf28362f35718ebcd3a22dbb17adb49154.tar.gz spark-85941ecf28362f35718ebcd3a22dbb17adb49154.tar.bz2 spark-85941ecf28362f35718ebcd3a22dbb17adb49154.zip |
[SPARK-11569][ML] Fix StringIndexer to handle null value properly
## What changes were proposed in this pull request?
This PR is to enhance StringIndexer with NULL values handling.
Before the PR, StringIndexer will throw an exception when encounters NULL values.
With this PR:
- handleInvalid=error: Throw an exception as before
- handleInvalid=skip: Skip null values as well as unseen labels
- handleInvalid=keep: Give null values an additional index as well as unseen labels
BTW, I noticed someone was trying to solve the same problem ( #9920 ) but seems getting no progress or response for a long time. Would you mind to give me a chance to solve it ? I'm eager to help. :-)
## How was this patch tested?
new unit tests
Author: Menglong TAN <tanmenglong@renrenche.com>
Author: Menglong TAN <tanmenglong@gmail.com>
Closes #17233 from crackcell/11569_StringIndexer_NULL.
Diffstat (limited to 'sql/core/src/test/resources/test-data/cars.tsv')
0 files changed, 0 insertions, 0 deletions