aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@apache.org>2014-08-19 22:42:50 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-08-19 22:42:50 -0700
commitebcb94f701273b56851dade677e047388a8bca09 (patch)
tree7cb860a5398134f94d80c21a4a8967213b644f7d /python/pyspark
parent0a984aa155fb7f532fe87620dcf1a2814c5b8b49 (diff)
downloadspark-ebcb94f701273b56851dade677e047388a8bca09.tar.gz
spark-ebcb94f701273b56851dade677e047388a8bca09.tar.bz2
spark-ebcb94f701273b56851dade677e047388a8bca09.zip
[SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs
This PR fixes two bugs related to `spark.local.dirs` and `SPARK_LOCAL_DIRS`, one where `Utils.getLocalDir()` might return an invalid directory (SPARK-2974) and another where the `SPARK_LOCAL_DIRS` override didn't affect the driver, which could cause problems when running tasks in local mode (SPARK-2975). This patch fixes both issues: the new `Utils.getOrCreateLocalRootDirs(conf: SparkConf)` utility method manages the creation of local directories and handles the precedence among the different configuration options, so we should see the same behavior whether we're running in local mode or on a worker. It's kind of a pain to mock out environment variables in tests (no easy way to mock System.getenv), so I added a `private[spark]` method to SparkConf for accessing environment variables (by default, it just delegates to System.getenv). By subclassing SparkConf and overriding this method, we can mock out SPARK_LOCAL_DIRS in tests. I also fixed a typo in PySpark where we used `SPARK_LOCAL_DIR` instead of `SPARK_LOCAL_DIRS` (I think this was technically innocuous, but it seemed worth fixing). Author: Josh Rosen <joshrosen@apache.org> Closes #2002 from JoshRosen/local-dirs and squashes the following commits: efad8c6 [Josh Rosen] Address review comments: 1dec709 [Josh Rosen] Minor updates to Javadocs. 7f36999 [Josh Rosen] Use env vars to detect if running in YARN container. 399ac25 [Josh Rosen] Update getLocalDir() documentation. bb3ad89 [Josh Rosen] Remove duplicated YARN getLocalDirs() code. 3e92d44 [Josh Rosen] Move local dirs override logic into Utils; fix bugs: b2c4736 [Josh Rosen] Add failing tests for SPARK-2974 and SPARK-2975. 007298b [Josh Rosen] Allow environment variables to be mocked in tests. 6d9259b [Josh Rosen] Fix typo in PySpark: SPARK_LOCAL_DIR should be SPARK_LOCAL_DIRS
Diffstat (limited to 'python/pyspark')
-rw-r--r--python/pyspark/shuffle.py2
1 files changed, 1 insertions, 1 deletions
diff --git a/python/pyspark/shuffle.py b/python/pyspark/shuffle.py
index 2c68cd4921..1ebe7df418 100644
--- a/python/pyspark/shuffle.py
+++ b/python/pyspark/shuffle.py
@@ -214,7 +214,7 @@ class ExternalMerger(Merger):
def _get_dirs(self):
""" Get all the directories """
- path = os.environ.get("SPARK_LOCAL_DIR", "/tmp")
+ path = os.environ.get("SPARK_LOCAL_DIRS", "/tmp")
dirs = path.split(",")
return [os.path.join(d, "python", str(os.getpid()), str(id(self)))
for d in dirs]