aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@apache.org>2014-10-19 00:31:06 -0700
committerJosh Rosen <joshrosen@databricks.com>2014-10-19 00:35:05 -0700
commit7e63bb49c526c3f872619ae14e4b5273f4c535e9 (patch)
tree241f07bb2627381f75b0b3791d0dbbac35baa5ea /docs/configuration.md
parent05db2da7dc256822cdb602c4821cbb9fb84dac98 (diff)
downloadspark-7e63bb49c526c3f872619ae14e4b5273f4c535e9.tar.gz
spark-7e63bb49c526c3f872619ae14e4b5273f4c535e9.tar.bz2
spark-7e63bb49c526c3f872619ae14e4b5273f4c535e9.zip
[SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)
This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket). This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues. Author: Josh Rosen <joshrosen@apache.org> Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits: f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf. b562451 [Josh Rosen] Remove unused jobConfCacheKey field. dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task. (cherry picked from commit 2cd40db2b3ab5ddcb323fd05c171dbd9025f9e71) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md9
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index f0204c640b..96fa1377ec 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -620,6 +620,15 @@ Apart from these, the following properties are also available, and may be useful
previous versions of Spark. Simply use Hadoop's FileSystem API to delete output directories by hand.</td>
</tr>
<tr>
+ <td><code>spark.hadoop.cloneConf</code></td>
+ <td>false</td>
+ <td>If set to true, clones a new Hadoop <code>Configuration</code> object for each task. This
+ option should be enabled to work around <code>Configuration</code> thread-safety issues (see
+ <a href="https://issues.apache.org/jira/browse/SPARK-2546">SPARK-2546</a> for more details).
+ This is disabled by default in order to avoid unexpected performance regressions for jobs that
+ are not affected by these issues.</td>
+</tr>
+<tr>
<td><code>spark.executor.heartbeatInterval</code></td>
<td>10000</td>
<td>Interval (milliseconds) between each executor's heartbeats to the driver. Heartbeats let