aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark
diff options
context:
space:
mode:
authorCheng Hao <hao.cheng@intel.com>2014-12-17 13:39:36 -0800
committerMichael Armbrust <michael@databricks.com>2014-12-17 13:39:36 -0800
commit636d9fc450faaa0d8e82e0d34bb7b791e3812cb7 (patch)
treeab0de7c89131b6bda143dc51228df6410f3eea8a /python/pyspark
parent902e4d54acbc3c88163a5c6447aff68ed57475c1 (diff)
downloadspark-636d9fc450faaa0d8e82e0d34bb7b791e3812cb7.tar.gz
spark-636d9fc450faaa0d8e82e0d34bb7b791e3812cb7.tar.bz2
spark-636d9fc450faaa0d8e82e0d34bb7b791e3812cb7.zip
[SPARK-3739] [SQL] Update the split num base on block size for table scanning
In local mode, Hadoop/Hive will ignore the "mapred.map.tasks", hence for small table file, it's always a single input split, however, SparkSQL doesn't honor that in table scanning, and we will get different result when do the Hive Compatibility test. This PR will fix that. Author: Cheng Hao <hao.cheng@intel.com> Closes #2589 from chenghao-intel/source_split and squashes the following commits: dff38e7 [Cheng Hao] Remove the extra blank line 160a2b6 [Cheng Hao] fix the compiling bug 04d67f7 [Cheng Hao] Keep 1 split for small file in table scanning
Diffstat (limited to 'python/pyspark')
0 files changed, 0 insertions, 0 deletions