From 85200c09adc6eb98fadb8505f55cb44e3d8b3390 Mon Sep 17 00:00:00 2001 From: felixcheung Date: Thu, 21 Jan 2016 16:30:20 +0100 Subject: [SPARK-12534][DOC] update documentation to list command line equivalent to properties Several Spark properties equivalent to Spark submit command line options are missing. Author: felixcheung Closes #10491 from felixcheung/sparksubmitdoc. --- docs/configuration.md | 10 +++++----- docs/job-scheduling.md | 5 ++++- docs/running-on-yarn.md | 27 +++++++++++++++++++++++++++ 3 files changed, 36 insertions(+), 6 deletions(-) (limited to 'docs') diff --git a/docs/configuration.md b/docs/configuration.md index 12ac601296..acaeb83008 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -173,7 +173,7 @@ of the most common options to set are: stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. - NOTE: In Spark 1.0 and later this will be overriden by SPARK_LOCAL_DIRS (Standalone, Mesos) or + NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager. @@ -687,10 +687,10 @@ Apart from these, the following properties are also available, and may be useful spark.rdd.compress false - Whether to compress serialized RDD partitions (e.g. for - StorageLevel.MEMORY_ONLY_SER in Java - and Scala or StorageLevel.MEMORY_ONLY in Python). - Can save substantial space at the cost of some extra CPU time. + Whether to compress serialized RDD partitions (e.g. for + StorageLevel.MEMORY_ONLY_SER in Java + and Scala or StorageLevel.MEMORY_ONLY in Python). + Can save substantial space at the cost of some extra CPU time. diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md index 6c587b3f0d..95d47794ea 100644 --- a/docs/job-scheduling.md +++ b/docs/job-scheduling.md @@ -39,7 +39,10 @@ Resource allocation can be configured as follows, based on the cluster type: and optionally set `spark.cores.max` to limit each application's resource share as in the standalone mode. You should also set `spark.executor.memory` to control the executor memory. * **YARN:** The `--num-executors` option to the Spark YARN client controls how many executors it will allocate - on the cluster, while `--executor-memory` and `--executor-cores` control the resources per executor. + on the cluster (`spark.executor.instances` as configuration property), while `--executor-memory` + (`spark.executor.memory` configuration property) and `--executor-cores` (`spark.executor.cores` configuration + property) control the resources per executor. For more information, see the + [YARN Spark Properties](running-on-yarn.html). A second option available on Mesos is _dynamic sharing_ of CPU cores. In this mode, each Spark application still has a fixed and independent memory allocation (set by `spark.executor.memory`), but when the diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index a148c867eb..ad66b9f64a 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -113,6 +113,19 @@ If you need a reference to the proper location to put log files in the YARN so t Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. + + spark.driver.memory + 1g + + Amount of memory to use for the driver process, i.e. where SparkContext is initialized. + (e.g. 1g, 2g). + +
Note: In client mode, this config must not be set through the SparkConf + directly in your application, because the driver JVM has already started at that point. + Instead, please set this through the --driver-memory command line option + or in your default properties file. + + spark.driver.cores 1 @@ -202,6 +215,13 @@ If you need a reference to the proper location to put log files in the YARN so t Comma-separated list of files to be placed in the working directory of each executor. + + spark.executor.cores + 1 in YARN mode, all the available cores on the worker in standalone mode. + + The number of cores to use on each executor. For YARN and standalone mode only. + + spark.executor.instances 2 @@ -209,6 +229,13 @@ If you need a reference to the proper location to put log files in the YARN so t The number of executors. Note that this property is incompatible with spark.dynamicAllocation.enabled. If both spark.dynamicAllocation.enabled and spark.executor.instances are specified, dynamic allocation is turned off and the specified number of spark.executor.instances is used. + + spark.executor.memory + 1g + + Amount of memory to use per executor process (e.g. 2g, 8g). + + spark.yarn.executor.memoryOverhead executorMemory * 0.10, with minimum of 384 -- cgit v1.2.3