aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2014-08-18 10:05:52 -0700
committerMichael Armbrust <michael@databricks.com>2014-08-18 10:05:52 -0700
commit6a13dca12fac06f3af892ffcc8922cc84f91b786 (patch)
treeba7935890d765436e3e17cf247ad21069141e499
parent7ae28d1247e4756219016206c51fec1656e3917b (diff)
downloadspark-6a13dca12fac06f3af892ffcc8922cc84f91b786.tar.gz
spark-6a13dca12fac06f3af892ffcc8922cc84f91b786.tar.bz2
spark-6a13dca12fac06f3af892ffcc8922cc84f91b786.zip
[SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins
BroadcastHashJoin has a broadcastFuture variable that tries to collect the broadcasted table in a separate thread, but this doesn't help because it's a lazy val that only gets initialized when you attempt to build the RDD. Thus queries that broadcast multiple tables would collect and broadcast them sequentially. I changed this to a val to let it start collecting right when the operator is created. Author: Matei Zaharia <matei@databricks.com> Closes #1990 from mateiz/spark-3084 and squashes the following commits: f468766 [Matei Zaharia] [SPARK-3084] Collect broadcasted tables in parallel in joins
-rw-r--r--sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala2
1 files changed, 1 insertions, 1 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
index c86811e838..481bb8c05e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
@@ -424,7 +424,7 @@ case class BroadcastHashJoin(
UnspecifiedDistribution :: UnspecifiedDistribution :: Nil
@transient
- lazy val broadcastFuture = future {
+ val broadcastFuture = future {
sparkContext.broadcast(buildPlan.executeCollect())
}