aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRajesh Balamohan <rbalamohan@apache.org>2016-03-25 15:09:42 -0700
committerJosh Rosen <joshrosen@databricks.com>2016-03-25 15:09:52 -0700
commitff7cc45f521c63ce40f955b8995d52a79dca17b4 (patch)
tree393f4c0341baaf4b7c8cea479498ab517616cd86
parentb554b3c46b0019a6caf0f9a975b460dc2570c3b2 (diff)
downloadspark-ff7cc45f521c63ce40f955b8995d52a79dca17b4.tar.gz
spark-ff7cc45f521c63ce40f955b8995d52a79dca17b4.tar.bz2
spark-ff7cc45f521c63ce40f955b8995d52a79dca17b4.zip
[SPARK-14091][CORE] Improve performance of SparkContext.getCallSite()
Currently SparkContext.getCallSite() makes a call to Utils.getCallSite(). ``` private[spark] def getCallSite(): CallSite = { val callSite = Utils.getCallSite() CallSite( Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm), Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm) ) } ``` However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed. This can have severe impact on smaller queries (that finish in 10-20 seconds) having large number of RDDs. Creating this patch for lazy evaluation of getCallSite. No new test cases are added. Following standalone test was tried out manually. Also, built entire spark binary and tried with few SQL queries in TPC-DS and TPC-H in multi node cluster ``` def run(): Unit = { val conf = new SparkConf() val sc = new SparkContext("local[1]", "test-context", conf) val start: Long = System.currentTimeMillis(); val confBroadcast = sc.broadcast(new SerializableConfiguration(new Configuration())) Utils.withDummyCallSite(sc) { //Large tables end up creating 5500 RDDs for(i <- 1 to 5000) { //ignore nulls in RDD as its mainly for testing callSite val testRDD = new HadoopRDD(sc, confBroadcast, None, null, classOf[NullWritable], classOf[Writable], 10) } } val end: Long = System.currentTimeMillis(); println("Time taken : " + (end - start)) } def main(args: Array[String]): Unit = { run } ``` Author: Rajesh Balamohan <rbalamohan@apache.org> Closes #11911 from rajeshbalamohan/SPARK-14091.
-rw-r--r--core/src/main/scala/org/apache/spark/SparkContext.scala2
1 files changed, 1 insertions, 1 deletions
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala
index d2cf3bfd60..dcb41f3a40 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1737,7 +1737,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* has overridden the call site using `setCallSite()`, this will return the user's version.
*/
private[spark] def getCallSite(): CallSite = {
- val callSite = Utils.getCallSite()
+ lazy val callSite = Utils.getCallSite()
CallSite(
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)