aboutsummaryrefslogtreecommitdiff
path: root/mllib/src/main
diff options
context:
space:
mode:
authorYin Huai <yhuai@databricks.com>2015-05-20 11:23:40 -0700
committerYin Huai <yhuai@databricks.com>2015-05-20 11:23:49 -0700
commit55bd1bb52e54f710264e6517bb42b74672dd71fb (patch)
tree349aed0bf794cfcca003fa17e4d926183c6ac69b /mllib/src/main
parent606ae3e10e76325c032860ad7be1da94921af44a (diff)
downloadspark-55bd1bb52e54f710264e6517bb42b74672dd71fb.tar.gz
spark-55bd1bb52e54f710264e6517bb42b74672dd71fb.tar.bz2
spark-55bd1bb52e54f710264e6517bb42b74672dd71fb.zip
[SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.
https://issues.apache.org/jira/browse/SPARK-7713 I tested the performance with the following code: ```scala import sqlContext._ import sqlContext.implicits._ (1 to 5000).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tmp/partitioned' )""") table("partitionedParquet").explain(true) ``` In our master `explain` takes 40s in my laptop. With this PR, `explain` takes 14s. Author: Yin Huai <yhuai@databricks.com> Closes #6252 from yhuai/broadcastHadoopConf and squashes the following commits: 6fa73df [Yin Huai] Address comments of Josh and Andrew. 807fbf9 [Yin Huai] Make the new buildScan and SqlNewHadoopRDD private sql. e393555 [Yin Huai] Cheng's comments. 2eb53bb [Yin Huai] Use a shared broadcast Hadoop Configuration for partitioned HadoopFsRelations. (cherry picked from commit b631bf73b9f288f37c98b806be430b22485880e5) Signed-off-by: Yin Huai <yhuai@databricks.com>
Diffstat (limited to 'mllib/src/main')
0 files changed, 0 insertions, 0 deletions