aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorAaron Davidson <aaron@databricks.com>2014-08-14 01:37:38 -0700
committerReynold Xin <rxin@apache.org>2014-08-14 01:37:38 -0700
commitd069c5d9d2f6ce06389ca2ddf0b3ae4db72c5797 (patch)
treeb6bd62749f63cd924073ed4d6e2b59324307b093 /docs/configuration.md
parent69a57a18ee35af1cc5a00b67a80837ea317cd330 (diff)
downloadspark-d069c5d9d2f6ce06389ca2ddf0b3ae4db72c5797.tar.gz
spark-d069c5d9d2f6ce06389ca2ddf0b3ae4db72c5797.tar.bz2
spark-d069c5d9d2f6ce06389ca2ddf0b3ae4db72c5797.zip
[SPARK-3029] Disable local execution of Spark jobs by default
Currently, local execution of Spark jobs is only used by take(), and it can be problematic as it can load a significant amount of data onto the driver. The worst case scenarios occur if the RDD is cached (guaranteed to load whole partition), has very large elements, or the partition is just large and we apply a filter with high selectivity or computational overhead. Additionally, jobs that run locally in this manner do not show up in the web UI, and are thus harder to track or understand what is occurring. This PR adds a flag to disable local execution, which is turned OFF by default, with the intention of perhaps eventually removing this functionality altogether. Removing it now is a tougher proposition since it is part of the public runJob API. An alternative solution would be to limit the flag to take()/first() to avoid impacting any external users of this API, but such usage (or, at least, reliance upon the feature) is hopefully minimal. Author: Aaron Davidson <aaron@databricks.com> Closes #1321 from aarondav/allowlocal and squashes the following commits: 136b253 [Aaron Davidson] Fix DAGSchedulerSuite 5599d55 [Aaron Davidson] [RFC] Disable local execution of Spark jobs by default
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index c8336b3913..c408c468dc 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -846,6 +846,15 @@ Apart from these, the following properties are also available, and may be useful
(in milliseconds).
</td>
</tr>
+<tr>
+ <td><code>spark.localExecution.enabled</code></td>
+ <td>false</td>
+ <td>
+ Enables Spark to run certain jobs, such as first() or take() on the driver, without sending
+ tasks to the cluster. This can make certain jobs execute very quickly, but may require
+ shipping a whole partition of data to the driver.
+ </td>
+</tr>
</table>
#### Security