aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorEvan Chan <ev@ooyala.com>2014-04-06 19:17:33 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-04-06 19:21:40 -0700
commit1440154c27ca48b5a75103eccc9057286d3f6ca8 (patch)
tree7f4b2fb31c31ba5a457c759b48a884492fe472dd /docs
parent4106558435889261243d186f5f0b51c5f9e98d56 (diff)
downloadspark-1440154c27ca48b5a75103eccc9057286d3f6ca8.tar.gz
spark-1440154c27ca48b5a75103eccc9057286d3f6ca8.tar.bz2
spark-1440154c27ca48b5a75103eccc9057286d3f6ca8.zip
SPARK-1154: Clean up app folders in worker nodes
This is a fix for [SPARK-1154](https://issues.apache.org/jira/browse/SPARK-1154). The issue is that worker nodes fill up with a huge number of app-* folders after some time. This change adds a periodic cleanup task which asynchronously deletes app directories older than a configurable TTL. Two new configuration parameters have been introduced: spark.worker.cleanup_interval spark.worker.app_data_ttl This change does not include moving the downloads of application jars to a location outside of the work directory. We will address that if we have time, but that potentially involves caching so it will come either as part of this PR or a separate PR. Author: Evan Chan <ev@ooyala.com> Author: Kelvin Chu <kelvinkwchu@yahoo.com> Closes #288 from velvia/SPARK-1154-cleanup-app-folders and squashes the following commits: 0689995 [Evan Chan] CR from @aarondav - move config, clarify for standalone mode 9f10d96 [Evan Chan] CR from @pwendell - rename configs and add cleanup.enabled f2f6027 [Evan Chan] CR from @andrewor14 553d8c2 [Kelvin Chu] change the variable name to currentTimeMillis since it actually tracks in seconds 8dc9cb5 [Kelvin Chu] Fixed a bug in Utils.findOldFiles() after merge. cb52f2b [Kelvin Chu] Change the name of findOldestFiles() to findOldFiles() 72f7d2d [Kelvin Chu] Fix a bug of Utils.findOldestFiles(). file.lastModified is returned in milliseconds. ad99955 [Kelvin Chu] Add unit test for Utils.findOldestFiles() dc1a311 [Evan Chan] Don't recompute current time with every new file e3c408e [Evan Chan] Document the two new settings b92752b [Evan Chan] SPARK-1154: Add a periodic task to clean up app directories
Diffstat (limited to 'docs')
-rw-r--r--docs/configuration.md26
1 files changed, 26 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index b6005acac8..57bda20edc 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -349,6 +349,32 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td>spark.worker.cleanup.enabled</td>
+ <td>true</td>
+ <td>
+ Enable periodic cleanup of worker / application directories. Note that this only affects standalone
+ mode, as YARN works differently.
+ </td>
+</tr>
+<tr>
+ <td>spark.worker.cleanup.interval</td>
+ <td>1800 (30 minutes)</td>
+ <td>
+ Controls the interval, in seconds, at which the worker cleans up old application work dirs
+ on the local machine.
+ </td>
+</tr>
+<tr>
+ <td>spark.worker.cleanup.appDataTtl</td>
+ <td>7 * 24 * 3600 (7 days)</td>
+ <td>
+ The number of seconds to retain application work directories on each worker. This is a Time To Live
+ and should depend on the amount of available disk space you have. Application logs and jars are
+ downloaded to each application work dir. Over time, the work dirs can quickly fill up disk space,
+ especially if you run jobs very frequently.
+ </td>
+</tr>
+<tr>
<td>spark.akka.frameSize</td>
<td>10</td>
<td>