[SPARK-1415] Hadoop min split for wholeTextFiles() - spark

diff options

author	Xusen Yin <yinxusen@gmail.com>	2014-04-13 13:18:52 -0700
committer	Matei Zaharia <matei@databricks.com>	2014-04-13 13:19:01 -0700
commit	1cf565f58f5e9b96c932c30bec6182bb645083ec (patch)
tree	1149963f7478c685c18696af118cf73e7562beb5 /sql
parent	3537e251eb9cd37687a308320630c405f9a9c5e8 (diff)
download	spark-1cf565f58f5e9b96c932c30bec6182bb645083ec.tar.gz spark-1cf565f58f5e9b96c932c30bec6182bb645083ec.tar.bz2 spark-1cf565f58f5e9b96c932c30bec6182bb645083ec.zip

[SPARK-1415] Hadoop min split for wholeTextFiles()

JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-1415). New Hadoop API of `InputFormat` does not provide the `minSplits` parameter, which makes the API incompatible between `HadoopRDD` and `NewHadoopRDD`. The PR is for constructing compatible APIs. Though `minSplits` is deprecated by New Hadoop API, we think it is better to make APIs compatible here. **Note** that `minSplits` in `wholeTextFiles` could only be treated as a *suggestion*, the real number of splits may not be greater than `minSplits` due to `isSplitable()=false`. Author: Xusen Yin <yinxusen@gmail.com> Closes #376 from yinxusen/hadoop-min-split and squashes the following commits: 76417f6 [Xusen Yin] refine comments c10af60 [Xusen Yin] refine comments and rewrite new class for wholeTextFile 766d05b [Xusen Yin] refine Java API and comments 4875755 [Xusen Yin] add minSplits for WholeTextFiles (cherry picked from commit 037fe4d2ba01be5610baa3dd9c5c9d3a5e5e1064) Signed-off-by: Matei Zaharia <matei@databricks.com>

Diffstat (limited to 'sql')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: