diff options
author | Xusen Yin <yinxusen@gmail.com> | 2014-04-13 13:18:52 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2014-04-13 13:18:52 -0700 |
commit | 037fe4d2ba01be5610baa3dd9c5c9d3a5e5e1064 (patch) | |
tree | 1149963f7478c685c18696af118cf73e7562beb5 /.rat-excludes | |
parent | 4bc07eebbf5e2ea0c0b6f1642049515025d88d07 (diff) | |
download | spark-037fe4d2ba01be5610baa3dd9c5c9d3a5e5e1064.tar.gz spark-037fe4d2ba01be5610baa3dd9c5c9d3a5e5e1064.tar.bz2 spark-037fe4d2ba01be5610baa3dd9c5c9d3a5e5e1064.zip |
[SPARK-1415] Hadoop min split for wholeTextFiles()
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-1415).
New Hadoop API of `InputFormat` does not provide the `minSplits` parameter, which makes the API incompatible between `HadoopRDD` and `NewHadoopRDD`. The PR is for constructing compatible APIs.
Though `minSplits` is deprecated by New Hadoop API, we think it is better to make APIs compatible here.
**Note** that `minSplits` in `wholeTextFiles` could only be treated as a *suggestion*, the real number of splits may not be greater than `minSplits` due to `isSplitable()=false`.
Author: Xusen Yin <yinxusen@gmail.com>
Closes #376 from yinxusen/hadoop-min-split and squashes the following commits:
76417f6 [Xusen Yin] refine comments
c10af60 [Xusen Yin] refine comments and rewrite new class for wholeTextFile
766d05b [Xusen Yin] refine Java API and comments
4875755 [Xusen Yin] add minSplits for WholeTextFiles
Diffstat (limited to '.rat-excludes')
0 files changed, 0 insertions, 0 deletions