[SPARK-3739] [SQL] Update the split num base on block size for table scanning - spark

diff options

author	Cheng Hao <hao.cheng@intel.com>	2014-12-17 13:39:36 -0800
committer	Michael Armbrust <michael@databricks.com>	2014-12-17 13:39:36 -0800
commit	636d9fc450faaa0d8e82e0d34bb7b791e3812cb7 (patch)
tree	ab0de7c89131b6bda143dc51228df6410f3eea8a /python/pyspark
parent	902e4d54acbc3c88163a5c6447aff68ed57475c1 (diff)
download	spark-636d9fc450faaa0d8e82e0d34bb7b791e3812cb7.tar.gz spark-636d9fc450faaa0d8e82e0d34bb7b791e3812cb7.tar.bz2 spark-636d9fc450faaa0d8e82e0d34bb7b791e3812cb7.zip

[SPARK-3739] [SQL] Update the split num base on block size for table scanning

In local mode, Hadoop/Hive will ignore the "mapred.map.tasks", hence for small table file, it's always a single input split, however, SparkSQL doesn't honor that in table scanning, and we will get different result when do the Hive Compatibility test. This PR will fix that. Author: Cheng Hao <hao.cheng@intel.com> Closes #2589 from chenghao-intel/source_split and squashes the following commits: dff38e7 [Cheng Hao] Remove the extra blank line 160a2b6 [Cheng Hao] fix the compiling bug 04d67f7 [Cheng Hao] Keep 1 split for small file in table scanning

Diffstat (limited to 'python/pyspark')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: