diff options
author | Ilya Ganelin <ilya.ganelin@capitalone.com> | 2015-04-13 16:28:07 -0700 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2015-04-13 16:28:07 -0700 |
commit | c4ab255e94366ba9b9023d5431f9d2412e0d6dc7 (patch) | |
tree | cade698e2139a54ab81957383c3ef2b5c8e8e9f2 /docs/configuration.md | |
parent | c5602bdc310cc8f82dc304500bebe40217cba785 (diff) | |
download | spark-c4ab255e94366ba9b9023d5431f9d2412e0d6dc7.tar.gz spark-c4ab255e94366ba9b9023d5431f9d2412e0d6dc7.tar.bz2 spark-c4ab255e94366ba9b9023d5431f9d2412e0d6dc7.zip |
[SPARK-5931][CORE] Use consistent naming for time properties
I've added new utility methods to do the conversion from times specified as e.g. 120s, 240ms, 360us to convert to a consistent internal representation. I've updated usage of these constants throughout the code to be consistent.
I believe I've captured all usages of time-based properties throughout the code. I've also updated variable names in a number of places to reflect their units for clarity and updated documentation where appropriate.
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Author: Ilya Ganelin <ilganeli@gmail.com>
Closes #5236 from ilganeli/SPARK-5931 and squashes the following commits:
4526c81 [Ilya Ganelin] Update configuration.md
de3bff9 [Ilya Ganelin] Fixing style errors
f5fafcd [Ilya Ganelin] Doc updates
951ca2d [Ilya Ganelin] Made the most recent round of changes
bc04e05 [Ilya Ganelin] Minor fixes and doc updates
25d3f52 [Ilya Ganelin] Minor nit fixes
642a06d [Ilya Ganelin] Fixed logic for invalid suffixes and addid matching test
8927e66 [Ilya Ganelin] Fixed handling of -1
69fedcc [Ilya Ganelin] Added test for zero
dc7bd08 [Ilya Ganelin] Fixed error in exception handling
7d19cdd [Ilya Ganelin] Added fix for possible NPE
6f651a8 [Ilya Ganelin] Now using regexes to simplify code in parseTimeString. Introduces getTimeAsSec and getTimeAsMs methods in SparkConf. Updated documentation
cbd2ca6 [Ilya Ganelin] Formatting error
1a1122c [Ilya Ganelin] Formatting fixes and added m for use as minute formatter
4e48679 [Ilya Ganelin] Fixed priority order and mixed up conversions in a couple spots
d4efd26 [Ilya Ganelin] Added time conversion for yarn.scheduler.heartbeat.interval-ms
cbf41db [Ilya Ganelin] Got rid of thrown exceptions
1465390 [Ilya Ganelin] Nit
28187bf [Ilya Ganelin] Convert straight to seconds
ff40bfe [Ilya Ganelin] Updated tests to fix small bugs
19c31af [Ilya Ganelin] Added cleaner computation of time conversions in tests
6387772 [Ilya Ganelin] Updated suffix handling to handle overlap of units more gracefully
5193d5f [Ilya Ganelin] Resolved merge conflicts
76cfa27 [Ilya Ganelin] [SPARK-5931] Minor nit fixes'
bf779b0 [Ilya Ganelin] Special handling of overlapping usffixes for java
dd0a680 [Ilya Ganelin] Updated scala code to call into java
b2fc965 [Ilya Ganelin] replaced get or default since it's not present in this version of java
39164f9 [Ilya Ganelin] [SPARK-5931] Updated Java conversion to be similar to scala conversion. Updated conversions to clean up code a little using TimeUnit.convert. Added Unit tests
3b126e1 [Ilya Ganelin] Fixed conversion to US from seconds
1858197 [Ilya Ganelin] Fixed bug where all time was being converted to us instead of the appropriate units
bac9edf [Ilya Ganelin] More whitespace
8613631 [Ilya Ganelin] Whitespace
1c0c07c [Ilya Ganelin] Updated Java code to add day, minutes, and hours
647b5ac [Ilya Ganelin] Udpated time conversion to use map iterator instead of if fall through
70ac213 [Ilya Ganelin] Fixed remaining usages to be consistent. Updated Java-side time conversion
68f4e93 [Ilya Ganelin] Updated more files to clean up usage of default time strings
3a12dd8 [Ilya Ganelin] Updated host revceiver
5232a36 [Ilya Ganelin] [SPARK-5931] Changed default behavior of time string conversion.
499bdf0 [Ilya Ganelin] Merge branch 'SPARK-5931' of github.com:ilganeli/spark into SPARK-5931
9e2547c [Ilya Ganelin] Reverting doc changes
8f741e1 [Ilya Ganelin] Update JavaUtils.java
34f87c2 [Ilya Ganelin] Update Utils.scala
9a29d8d [Ilya Ganelin] Fixed misuse of time in streaming context test
42477aa [Ilya Ganelin] Updated configuration doc with note on specifying time properties
cde9bff [Ilya Ganelin] Updated spark.streaming.blockInterval
c6a0095 [Ilya Ganelin] Updated spark.core.connection.auth.wait.timeout
5181597 [Ilya Ganelin] Updated spark.dynamicAllocation.schedulerBacklogTimeout
2fcc91c [Ilya Ganelin] Updated spark.dynamicAllocation.executorIdleTimeout
6d1518e [Ilya Ganelin] Upated spark.speculation.interval
3f1cfc8 [Ilya Ganelin] Updated spark.scheduler.revive.interval
3352d34 [Ilya Ganelin] Updated spark.scheduler.maxRegisteredResourcesWaitingTime
272c215 [Ilya Ganelin] Updated spark.locality.wait
7320c87 [Ilya Ganelin] updated spark.akka.heartbeat.interval
064ebd6 [Ilya Ganelin] Updated usage of spark.cleaner.ttl
21ef3dd [Ilya Ganelin] updated spark.shuffle.sasl.timeout
c9f5cad [Ilya Ganelin] Updated spark.shuffle.io.retryWait
4933fda [Ilya Ganelin] Updated usage of spark.storage.blockManagerSlaveTimeout
7db6d2a [Ilya Ganelin] Updated usage of spark.akka.timeout
404f8c3 [Ilya Ganelin] Updated usage of spark.core.connection.ack.wait.timeout
59bf9e1 [Ilya Ganelin] [SPARK-5931] Updated Utils and JavaUtils classes to add helper methods to handle time strings. Updated time strings in a few places to properly parse time
Diffstat (limited to 'docs/configuration.md')
-rw-r--r-- | docs/configuration.md | 86 |
1 files changed, 47 insertions, 39 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index 7fe1147521..7169ec295e 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -35,9 +35,19 @@ val conf = new SparkConf() val sc = new SparkContext(conf) {% endhighlight %} -Note that we can have more than 1 thread in local mode, and in cases like spark streaming, we may actually -require one to prevent any sort of starvation issues. +Note that we can have more than 1 thread in local mode, and in cases like Spark Streaming, we may +actually require one to prevent any sort of starvation issues. +Properties that specify some time duration should be configured with a unit of time. +The following format is accepted: + + 25ms (milliseconds) + 5s (seconds) + 10m or 10min (minutes) + 3h (hours) + 5d (days) + 1y (years) + ## Dynamically Loading Spark Properties In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For instance, if you'd like to run the same application with different masters or different @@ -429,10 +439,10 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.shuffle.io.retryWait</code></td> - <td>5</td> + <td>5s</td> <td> - (Netty only) Seconds to wait between retries of fetches. The maximum delay caused by retrying - is simply <code>maxRetries * retryWait</code>, by default 15 seconds. + (Netty only) How long to wait between retries of fetches. The maximum delay caused by retrying + is 15 seconds by default, calculated as <code>maxRetries * retryWait</code>. </td> </tr> <tr> @@ -732,17 +742,17 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.executor.heartbeatInterval</code></td> - <td>10000</td> - <td>Interval (milliseconds) between each executor's heartbeats to the driver. Heartbeats let + <td>10s</td> + <td>Interval between each executor's heartbeats to the driver. Heartbeats let the driver know that the executor is still alive and update it with metrics for in-progress tasks.</td> </tr> <tr> <td><code>spark.files.fetchTimeout</code></td> - <td>60</td> + <td>60s</td> <td> Communication timeout to use when fetching files added through SparkContext.addFile() from - the driver, in seconds. + the driver. </td> </tr> <tr> @@ -853,11 +863,11 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.akka.heartbeat.interval</code></td> - <td>1000</td> + <td>1000s</td> <td> This is set to a larger value to disable the transport failure detector that comes built in to Akka. It can be enabled again, if you plan to use this feature (Not recommended). A larger - interval value in seconds reduces network overhead and a smaller value ( ~ 1 s) might be more + interval value reduces network overhead and a smaller value ( ~ 1 s) might be more informative for Akka's failure detector. Tune this in combination of `spark.akka.heartbeat.pauses` if you need to. A likely positive use case for using failure detector would be: a sensistive failure detector can help evict rogue executors quickly. However this is usually not the case @@ -868,11 +878,11 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.akka.heartbeat.pauses</code></td> - <td>6000</td> + <td>6000s</td> <td> This is set to a larger value to disable the transport failure detector that comes built in to Akka. It can be enabled again, if you plan to use this feature (Not recommended). Acceptable heart - beat pause in seconds for Akka. This can be used to control sensitivity to GC pauses. Tune + beat pause for Akka. This can be used to control sensitivity to GC pauses. Tune this along with `spark.akka.heartbeat.interval` if you need to. </td> </tr> @@ -886,9 +896,9 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.akka.timeout</code></td> - <td>100</td> + <td>100s</td> <td> - Communication timeout between Spark nodes, in seconds. + Communication timeout between Spark nodes. </td> </tr> <tr> @@ -938,10 +948,10 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.network.timeout</code></td> - <td>120</td> + <td>120s</td> <td> - Default timeout for all network interactions, in seconds. This config will be used in - place of <code>spark.core.connection.ack.wait.timeout</code>, <code>spark.akka.timeout</code>, + Default timeout for all network interactions. This config will be used in place of + <code>spark.core.connection.ack.wait.timeout</code>, <code>spark.akka.timeout</code>, <code>spark.storage.blockManagerSlaveTimeoutMs</code> or <code>spark.shuffle.io.connectionTimeout</code>, if they are not configured. </td> @@ -989,9 +999,9 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.locality.wait</code></td> - <td>3000</td> + <td>3s</td> <td> - Number of milliseconds to wait to launch a data-local task before giving up and launching it + How long to wait to launch a data-local task before giving up and launching it on a less-local node. The same wait will be used to step through multiple locality levels (process-local, node-local, rack-local and then any). It is also possible to customize the waiting time for each level by setting <code>spark.locality.wait.node</code>, etc. @@ -1024,10 +1034,9 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.scheduler.maxRegisteredResourcesWaitingTime</code></td> - <td>30000</td> + <td>30s</td> <td> - Maximum amount of time to wait for resources to register before scheduling begins - (in milliseconds). + Maximum amount of time to wait for resources to register before scheduling begins. </td> </tr> <tr> @@ -1054,10 +1063,9 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.scheduler.revive.interval</code></td> - <td>1000</td> + <td>1s</td> <td> - The interval length for the scheduler to revive the worker resource offers to run tasks - (in milliseconds). + The interval length for the scheduler to revive the worker resource offers to run tasks. </td> </tr> <tr> @@ -1070,9 +1078,9 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.speculation.interval</code></td> - <td>100</td> + <td>100ms</td> <td> - How often Spark will check for tasks to speculate, in milliseconds. + How often Spark will check for tasks to speculate. </td> </tr> <tr> @@ -1127,10 +1135,10 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.dynamicAllocation.executorIdleTimeout</code></td> - <td>600</td> + <td>600s</td> <td> - If dynamic allocation is enabled and an executor has been idle for more than this duration - (in seconds), the executor will be removed. For more detail, see this + If dynamic allocation is enabled and an executor has been idle for more than this duration, + the executor will be removed. For more detail, see this <a href="job-scheduling.html#resource-allocation-policy">description</a>. </td> </tr> @@ -1157,10 +1165,10 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.dynamicAllocation.schedulerBacklogTimeout</code></td> - <td>5</td> + <td>5s</td> <td> If dynamic allocation is enabled and there have been pending tasks backlogged for more than - this duration (in seconds), new executors will be requested. For more detail, see this + this duration, new executors will be requested. For more detail, see this <a href="job-scheduling.html#resource-allocation-policy">description</a>. </td> </tr> @@ -1215,18 +1223,18 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.core.connection.ack.wait.timeout</code></td> - <td>60</td> + <td>60s</td> <td> - Number of seconds for the connection to wait for ack to occur before timing + How long for the connection to wait for ack to occur before timing out and giving up. To avoid unwilling timeout caused by long pause like GC, you can set larger value. </td> </tr> <tr> <td><code>spark.core.connection.auth.wait.timeout</code></td> - <td>30</td> + <td>30s</td> <td> - Number of seconds for the connection to wait for authentication to occur before timing + How long for the connection to wait for authentication to occur before timing out and giving up. </td> </tr> @@ -1347,9 +1355,9 @@ Apart from these, the following properties are also available, and may be useful <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> <tr> <td><code>spark.streaming.blockInterval</code></td> - <td>200</td> + <td>200ms</td> <td> - Interval (milliseconds) at which data received by Spark Streaming receivers is chunked + Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark. Minimum recommended - 50 ms. See the <a href="streaming-programming-guide.html#level-of-parallelism-in-data-receiving">performance tuning</a> section in the Spark Streaming programing guide for more details. |