[SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan. - spark

diff options

author	Yin Huai <yhuai@databricks.com>	2015-05-20 11:23:40 -0700
committer	Yin Huai <yhuai@databricks.com>	2015-05-20 11:23:49 -0700
commit	55bd1bb52e54f710264e6517bb42b74672dd71fb (patch)
tree	349aed0bf794cfcca003fa17e4d926183c6ac69b /mllib/src/main
parent	606ae3e10e76325c032860ad7be1da94921af44a (diff)
download	spark-55bd1bb52e54f710264e6517bb42b74672dd71fb.tar.gz spark-55bd1bb52e54f710264e6517bb42b74672dd71fb.tar.bz2 spark-55bd1bb52e54f710264e6517bb42b74672dd71fb.zip

[SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.

https://issues.apache.org/jira/browse/SPARK-7713 I tested the performance with the following code: ```scala import sqlContext._ import sqlContext.implicits._ (1 to 5000).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tmp/partitioned' )""") table("partitionedParquet").explain(true) ``` In our master `explain` takes 40s in my laptop. With this PR, `explain` takes 14s. Author: Yin Huai <yhuai@databricks.com> Closes #6252 from yhuai/broadcastHadoopConf and squashes the following commits: 6fa73df [Yin Huai] Address comments of Josh and Andrew. 807fbf9 [Yin Huai] Make the new buildScan and SqlNewHadoopRDD private sql. e393555 [Yin Huai] Cheng's comments. 2eb53bb [Yin Huai] Use a shared broadcast Hadoop Configuration for partitioned HadoopFsRelations. (cherry picked from commit b631bf73b9f288f37c98b806be430b22485880e5) Signed-off-by: Yin Huai <yhuai@databricks.com>

Diffstat (limited to 'mllib/src/main')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: