aboutsummaryrefslogtreecommitdiff
path: root/external
diff options
context:
space:
mode:
authorParth Brahmbhatt <pbrahmbhatt@netflix.com>2016-05-24 20:58:20 -0700
committerReynold Xin <rxin@databricks.com>2016-05-24 20:58:20 -0700
commit4acababcaba567c85f3be0d5e939d99119b82d1d (patch)
tree2399ad66af26dcee58de8ba0c4c8ea18fefd07d2 /external
parent14494da87bdf057d2d2f796b962a4d8bc4747d31 (diff)
downloadspark-4acababcaba567c85f3be0d5e939d99119b82d1d.tar.gz
spark-4acababcaba567c85f3be0d5e939d99119b82d1d.tar.bz2
spark-4acababcaba567c85f3be0d5e939d99119b82d1d.zip
[SPARK-15365][SQL] When table size statistics are not available from metastore, we should fallback to HDFS
## What changes were proposed in this pull request? Currently if a table is used in join operation we rely on Metastore returned size to calculate if we can convert the operation to Broadcast join. This optimization only kicks in for table's that have the statistics available in metastore. Hive generally rolls over to HDFS if the statistics are not available directly from metastore and this seems like a reasonable choice to adopt given the optimization benefit of using broadcast joins. ## How was this patch tested? I have executed queries locally to test. Author: Parth Brahmbhatt <pbrahmbhatt@netflix.com> Closes #13150 from Parth-Brahmbhatt/SPARK-15365.
Diffstat (limited to 'external')
0 files changed, 0 insertions, 0 deletions