aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorIlya Ganelin <ilya.ganelin@capitalone.com>2015-04-13 16:28:07 -0700
committerAndrew Or <andrew@databricks.com>2015-04-13 16:28:07 -0700
commitc4ab255e94366ba9b9023d5431f9d2412e0d6dc7 (patch)
treecade698e2139a54ab81957383c3ef2b5c8e8e9f2 /docs/configuration.md
parentc5602bdc310cc8f82dc304500bebe40217cba785 (diff)
downloadspark-c4ab255e94366ba9b9023d5431f9d2412e0d6dc7.tar.gz
spark-c4ab255e94366ba9b9023d5431f9d2412e0d6dc7.tar.bz2
spark-c4ab255e94366ba9b9023d5431f9d2412e0d6dc7.zip
[SPARK-5931][CORE] Use consistent naming for time properties
I've added new utility methods to do the conversion from times specified as e.g. 120s, 240ms, 360us to convert to a consistent internal representation. I've updated usage of these constants throughout the code to be consistent. I believe I've captured all usages of time-based properties throughout the code. I've also updated variable names in a number of places to reflect their units for clarity and updated documentation where appropriate. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Author: Ilya Ganelin <ilganeli@gmail.com> Closes #5236 from ilganeli/SPARK-5931 and squashes the following commits: 4526c81 [Ilya Ganelin] Update configuration.md de3bff9 [Ilya Ganelin] Fixing style errors f5fafcd [Ilya Ganelin] Doc updates 951ca2d [Ilya Ganelin] Made the most recent round of changes bc04e05 [Ilya Ganelin] Minor fixes and doc updates 25d3f52 [Ilya Ganelin] Minor nit fixes 642a06d [Ilya Ganelin] Fixed logic for invalid suffixes and addid matching test 8927e66 [Ilya Ganelin] Fixed handling of -1 69fedcc [Ilya Ganelin] Added test for zero dc7bd08 [Ilya Ganelin] Fixed error in exception handling 7d19cdd [Ilya Ganelin] Added fix for possible NPE 6f651a8 [Ilya Ganelin] Now using regexes to simplify code in parseTimeString. Introduces getTimeAsSec and getTimeAsMs methods in SparkConf. Updated documentation cbd2ca6 [Ilya Ganelin] Formatting error 1a1122c [Ilya Ganelin] Formatting fixes and added m for use as minute formatter 4e48679 [Ilya Ganelin] Fixed priority order and mixed up conversions in a couple spots d4efd26 [Ilya Ganelin] Added time conversion for yarn.scheduler.heartbeat.interval-ms cbf41db [Ilya Ganelin] Got rid of thrown exceptions 1465390 [Ilya Ganelin] Nit 28187bf [Ilya Ganelin] Convert straight to seconds ff40bfe [Ilya Ganelin] Updated tests to fix small bugs 19c31af [Ilya Ganelin] Added cleaner computation of time conversions in tests 6387772 [Ilya Ganelin] Updated suffix handling to handle overlap of units more gracefully 5193d5f [Ilya Ganelin] Resolved merge conflicts 76cfa27 [Ilya Ganelin] [SPARK-5931] Minor nit fixes' bf779b0 [Ilya Ganelin] Special handling of overlapping usffixes for java dd0a680 [Ilya Ganelin] Updated scala code to call into java b2fc965 [Ilya Ganelin] replaced get or default since it's not present in this version of java 39164f9 [Ilya Ganelin] [SPARK-5931] Updated Java conversion to be similar to scala conversion. Updated conversions to clean up code a little using TimeUnit.convert. Added Unit tests 3b126e1 [Ilya Ganelin] Fixed conversion to US from seconds 1858197 [Ilya Ganelin] Fixed bug where all time was being converted to us instead of the appropriate units bac9edf [Ilya Ganelin] More whitespace 8613631 [Ilya Ganelin] Whitespace 1c0c07c [Ilya Ganelin] Updated Java code to add day, minutes, and hours 647b5ac [Ilya Ganelin] Udpated time conversion to use map iterator instead of if fall through 70ac213 [Ilya Ganelin] Fixed remaining usages to be consistent. Updated Java-side time conversion 68f4e93 [Ilya Ganelin] Updated more files to clean up usage of default time strings 3a12dd8 [Ilya Ganelin] Updated host revceiver 5232a36 [Ilya Ganelin] [SPARK-5931] Changed default behavior of time string conversion. 499bdf0 [Ilya Ganelin] Merge branch 'SPARK-5931' of github.com:ilganeli/spark into SPARK-5931 9e2547c [Ilya Ganelin] Reverting doc changes 8f741e1 [Ilya Ganelin] Update JavaUtils.java 34f87c2 [Ilya Ganelin] Update Utils.scala 9a29d8d [Ilya Ganelin] Fixed misuse of time in streaming context test 42477aa [Ilya Ganelin] Updated configuration doc with note on specifying time properties cde9bff [Ilya Ganelin] Updated spark.streaming.blockInterval c6a0095 [Ilya Ganelin] Updated spark.core.connection.auth.wait.timeout 5181597 [Ilya Ganelin] Updated spark.dynamicAllocation.schedulerBacklogTimeout 2fcc91c [Ilya Ganelin] Updated spark.dynamicAllocation.executorIdleTimeout 6d1518e [Ilya Ganelin] Upated spark.speculation.interval 3f1cfc8 [Ilya Ganelin] Updated spark.scheduler.revive.interval 3352d34 [Ilya Ganelin] Updated spark.scheduler.maxRegisteredResourcesWaitingTime 272c215 [Ilya Ganelin] Updated spark.locality.wait 7320c87 [Ilya Ganelin] updated spark.akka.heartbeat.interval 064ebd6 [Ilya Ganelin] Updated usage of spark.cleaner.ttl 21ef3dd [Ilya Ganelin] updated spark.shuffle.sasl.timeout c9f5cad [Ilya Ganelin] Updated spark.shuffle.io.retryWait 4933fda [Ilya Ganelin] Updated usage of spark.storage.blockManagerSlaveTimeout 7db6d2a [Ilya Ganelin] Updated usage of spark.akka.timeout 404f8c3 [Ilya Ganelin] Updated usage of spark.core.connection.ack.wait.timeout 59bf9e1 [Ilya Ganelin] [SPARK-5931] Updated Utils and JavaUtils classes to add helper methods to handle time strings. Updated time strings in a few places to properly parse time
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md86
1 files changed, 47 insertions, 39 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 7fe1147521..7169ec295e 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -35,9 +35,19 @@ val conf = new SparkConf()
val sc = new SparkContext(conf)
{% endhighlight %}
-Note that we can have more than 1 thread in local mode, and in cases like spark streaming, we may actually
-require one to prevent any sort of starvation issues.
+Note that we can have more than 1 thread in local mode, and in cases like Spark Streaming, we may
+actually require one to prevent any sort of starvation issues.
+Properties that specify some time duration should be configured with a unit of time.
+The following format is accepted:
+
+ 25ms (milliseconds)
+ 5s (seconds)
+ 10m or 10min (minutes)
+ 3h (hours)
+ 5d (days)
+ 1y (years)
+
## Dynamically Loading Spark Properties
In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For
instance, if you'd like to run the same application with different masters or different
@@ -429,10 +439,10 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.shuffle.io.retryWait</code></td>
- <td>5</td>
+ <td>5s</td>
<td>
- (Netty only) Seconds to wait between retries of fetches. The maximum delay caused by retrying
- is simply <code>maxRetries * retryWait</code>, by default 15 seconds.
+ (Netty only) How long to wait between retries of fetches. The maximum delay caused by retrying
+ is 15 seconds by default, calculated as <code>maxRetries * retryWait</code>.
</td>
</tr>
<tr>
@@ -732,17 +742,17 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.executor.heartbeatInterval</code></td>
- <td>10000</td>
- <td>Interval (milliseconds) between each executor's heartbeats to the driver. Heartbeats let
+ <td>10s</td>
+ <td>Interval between each executor's heartbeats to the driver. Heartbeats let
the driver know that the executor is still alive and update it with metrics for in-progress
tasks.</td>
</tr>
<tr>
<td><code>spark.files.fetchTimeout</code></td>
- <td>60</td>
+ <td>60s</td>
<td>
Communication timeout to use when fetching files added through SparkContext.addFile() from
- the driver, in seconds.
+ the driver.
</td>
</tr>
<tr>
@@ -853,11 +863,11 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.akka.heartbeat.interval</code></td>
- <td>1000</td>
+ <td>1000s</td>
<td>
This is set to a larger value to disable the transport failure detector that comes built in to
Akka. It can be enabled again, if you plan to use this feature (Not recommended). A larger
- interval value in seconds reduces network overhead and a smaller value ( ~ 1 s) might be more
+ interval value reduces network overhead and a smaller value ( ~ 1 s) might be more
informative for Akka's failure detector. Tune this in combination of `spark.akka.heartbeat.pauses`
if you need to. A likely positive use case for using failure detector would be: a sensistive
failure detector can help evict rogue executors quickly. However this is usually not the case
@@ -868,11 +878,11 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.akka.heartbeat.pauses</code></td>
- <td>6000</td>
+ <td>6000s</td>
<td>
This is set to a larger value to disable the transport failure detector that comes built in to Akka.
It can be enabled again, if you plan to use this feature (Not recommended). Acceptable heart
- beat pause in seconds for Akka. This can be used to control sensitivity to GC pauses. Tune
+ beat pause for Akka. This can be used to control sensitivity to GC pauses. Tune
this along with `spark.akka.heartbeat.interval` if you need to.
</td>
</tr>
@@ -886,9 +896,9 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.akka.timeout</code></td>
- <td>100</td>
+ <td>100s</td>
<td>
- Communication timeout between Spark nodes, in seconds.
+ Communication timeout between Spark nodes.
</td>
</tr>
<tr>
@@ -938,10 +948,10 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.network.timeout</code></td>
- <td>120</td>
+ <td>120s</td>
<td>
- Default timeout for all network interactions, in seconds. This config will be used in
- place of <code>spark.core.connection.ack.wait.timeout</code>, <code>spark.akka.timeout</code>,
+ Default timeout for all network interactions. This config will be used in place of
+ <code>spark.core.connection.ack.wait.timeout</code>, <code>spark.akka.timeout</code>,
<code>spark.storage.blockManagerSlaveTimeoutMs</code> or
<code>spark.shuffle.io.connectionTimeout</code>, if they are not configured.
</td>
@@ -989,9 +999,9 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.locality.wait</code></td>
- <td>3000</td>
+ <td>3s</td>
<td>
- Number of milliseconds to wait to launch a data-local task before giving up and launching it
+ How long to wait to launch a data-local task before giving up and launching it
on a less-local node. The same wait will be used to step through multiple locality levels
(process-local, node-local, rack-local and then any). It is also possible to customize the
waiting time for each level by setting <code>spark.locality.wait.node</code>, etc.
@@ -1024,10 +1034,9 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.scheduler.maxRegisteredResourcesWaitingTime</code></td>
- <td>30000</td>
+ <td>30s</td>
<td>
- Maximum amount of time to wait for resources to register before scheduling begins
- (in milliseconds).
+ Maximum amount of time to wait for resources to register before scheduling begins.
</td>
</tr>
<tr>
@@ -1054,10 +1063,9 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.scheduler.revive.interval</code></td>
- <td>1000</td>
+ <td>1s</td>
<td>
- The interval length for the scheduler to revive the worker resource offers to run tasks
- (in milliseconds).
+ The interval length for the scheduler to revive the worker resource offers to run tasks.
</td>
</tr>
<tr>
@@ -1070,9 +1078,9 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.speculation.interval</code></td>
- <td>100</td>
+ <td>100ms</td>
<td>
- How often Spark will check for tasks to speculate, in milliseconds.
+ How often Spark will check for tasks to speculate.
</td>
</tr>
<tr>
@@ -1127,10 +1135,10 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.dynamicAllocation.executorIdleTimeout</code></td>
- <td>600</td>
+ <td>600s</td>
<td>
- If dynamic allocation is enabled and an executor has been idle for more than this duration
- (in seconds), the executor will be removed. For more detail, see this
+ If dynamic allocation is enabled and an executor has been idle for more than this duration,
+ the executor will be removed. For more detail, see this
<a href="job-scheduling.html#resource-allocation-policy">description</a>.
</td>
</tr>
@@ -1157,10 +1165,10 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.dynamicAllocation.schedulerBacklogTimeout</code></td>
- <td>5</td>
+ <td>5s</td>
<td>
If dynamic allocation is enabled and there have been pending tasks backlogged for more than
- this duration (in seconds), new executors will be requested. For more detail, see this
+ this duration, new executors will be requested. For more detail, see this
<a href="job-scheduling.html#resource-allocation-policy">description</a>.
</td>
</tr>
@@ -1215,18 +1223,18 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.core.connection.ack.wait.timeout</code></td>
- <td>60</td>
+ <td>60s</td>
<td>
- Number of seconds for the connection to wait for ack to occur before timing
+ How long for the connection to wait for ack to occur before timing
out and giving up. To avoid unwilling timeout caused by long pause like GC,
you can set larger value.
</td>
</tr>
<tr>
<td><code>spark.core.connection.auth.wait.timeout</code></td>
- <td>30</td>
+ <td>30s</td>
<td>
- Number of seconds for the connection to wait for authentication to occur before timing
+ How long for the connection to wait for authentication to occur before timing
out and giving up.
</td>
</tr>
@@ -1347,9 +1355,9 @@ Apart from these, the following properties are also available, and may be useful
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td><code>spark.streaming.blockInterval</code></td>
- <td>200</td>
+ <td>200ms</td>
<td>
- Interval (milliseconds) at which data received by Spark Streaming receivers is chunked
+ Interval at which data received by Spark Streaming receivers is chunked
into blocks of data before storing them in Spark. Minimum recommended - 50 ms. See the
<a href="streaming-programming-guide.html#level-of-parallelism-in-data-receiving">performance
tuning</a> section in the Spark Streaming programing guide for more details.