aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@apache.org>2014-10-19 00:31:06 -0700
committerJosh Rosen <joshrosen@databricks.com>2014-10-19 00:35:05 -0700
commit7e63bb49c526c3f872619ae14e4b5273f4c535e9 (patch)
tree241f07bb2627381f75b0b3791d0dbbac35baa5ea /project
parent05db2da7dc256822cdb602c4821cbb9fb84dac98 (diff)
downloadspark-7e63bb49c526c3f872619ae14e4b5273f4c535e9.tar.gz
spark-7e63bb49c526c3f872619ae14e4b5273f4c535e9.tar.bz2
spark-7e63bb49c526c3f872619ae14e4b5273f4c535e9.zip
[SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)
This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket). This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues. Author: Josh Rosen <joshrosen@apache.org> Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits: f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf. b562451 [Josh Rosen] Remove unused jobConfCacheKey field. dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task. (cherry picked from commit 2cd40db2b3ab5ddcb323fd05c171dbd9025f9e71) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
Diffstat (limited to 'project')
0 files changed, 0 insertions, 0 deletions