diff options
author | Liang-Chi Hsieh <simonh@tw.ibm.com> | 2016-04-23 01:11:36 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-04-23 01:11:36 +0800 |
commit | 8098f158576b07343f74e2061d217b106c71b62d (patch) | |
tree | 82622c423578c8b535cd486d4a83558f7e29f573 /sql | |
parent | c089c6f4e83d85e622b8d13f466a656c2852702b (diff) | |
download | spark-8098f158576b07343f74e2061d217b106c71b62d.tar.gz spark-8098f158576b07343f74e2061d217b106c71b62d.tar.bz2 spark-8098f158576b07343f74e2061d217b106c71b62d.zip |
[SPARK-14843][ML] Fix encoding error in LibSVMRelation
## What changes were proposed in this pull request?
We use `RowEncoder` in libsvm data source to serialize the label and features read from libsvm files. However, the schema passed in this encoder is not correct. As the result, we can't correctly select `features` column from the DataFrame. We should use full data schema instead of `requiredSchema` to serialize the data read in. Then do projection to select required columns later.
## How was this patch tested?
`LibSVMRelationSuite`.
Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
Closes #12611 from viirya/fix-libsvm.
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions