diff options
author | Davies Liu <davies@databricks.com> | 2016-06-10 14:32:43 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-06-10 14:32:43 -0700 |
commit | aec502d9114ad8e18bfbbd63f38780e076d326d1 (patch) | |
tree | 5aa6b1479a6f677b4690816a96000ac064aa0338 /mllib/src/main | |
parent | e05a2feebe928df691d5a8f42f22e088c6263dcf (diff) | |
download | spark-aec502d9114ad8e18bfbbd63f38780e076d326d1.tar.gz spark-aec502d9114ad8e18bfbbd63f38780e076d326d1.tar.bz2 spark-aec502d9114ad8e18bfbbd63f38780e076d326d1.zip |
[SPARK-15654] [SQL] fix non-splitable files for text based file formats
## What changes were proposed in this pull request?
Currently, we always split the files when it's bigger than maxSplitBytes, but Hadoop LineRecordReader does not respect the splits for compressed files correctly, we should have a API for FileFormat to check whether the file could be splitted or not.
This PR is based on #13442, closes #13442
## How was this patch tested?
add regression tests.
Author: Davies Liu <davies@databricks.com>
Closes #13531 from davies/fix_split.
Diffstat (limited to 'mllib/src/main')
-rw-r--r-- | mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala index 7629369ab1..b5b2a681e9 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala @@ -112,7 +112,7 @@ private[libsvm] class LibSVMOutputWriter( */ // If this is moved or renamed, please update DataSource's backwardCompatibilityMap. @Since("1.6.0") -class LibSVMFileFormat extends FileFormat with DataSourceRegister { +class LibSVMFileFormat extends TextBasedFileFormat with DataSourceRegister { @Since("1.6.0") override def shortName(): String = "libsvm" |