aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md28
1 files changed, 25 insertions, 3 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index fecd8f2cc2..43b06df99d 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -95,10 +95,19 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
The amount of off heap memory (in megabytes) to be allocated per driver. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc.
</td>
</tr>
+<tr>
+ <td><code>spark.yarn.jar</code></td>
+ <td>(none)</td>
+ <td>
+ The location of the Spark jar file, in case overriding the default location is desired.
+ By default, Spark on YARN will use a Spark jar installed locally, but the Spark jar can also be
+ in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't
+ need to be distributed each time an application runs. To point to a jar on HDFS, for example,
+ set this configuration to "hdfs:///some/path".
+ </td>
+</tr>
</table>
-By default, Spark on YARN will use a Spark jar installed locally, but the Spark JAR can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to a JAR on HDFS, `export SPARK_JAR=hdfs:///some/path`.
-
# Launching Spark on YARN
Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster.
@@ -156,7 +165,20 @@ all environment variables used for launching each container. This process is use
classpath problems in particular. (Note that enabling this requires admin privileges on cluster
settings and a restart of all node managers. Thus, this is not applicable to hosted clusters).
-# Important Notes
+To use a custom log4j configuration for the application master or executors, there are two options:
+
+- upload a custom log4j.properties using spark-submit, by adding it to the "--files" list of files
+ to be uploaded with the application.
+- add "-Dlog4j.configuration=<location of configuration file>" to "spark.driver.extraJavaOptions"
+ (for the driver) or "spark.executor.extraJavaOptions" (for executors). Note that if using a file,
+ the "file:" protocol should be explicitly provided, and the file needs to exist locally on all
+ the nodes.
+
+Note that for the first option, both executors and the application master will share the same
+log4j configuration, which may cause issues when they run on the same node (e.g. trying to write
+to the same log file).
+
+# Important notes
- Before Hadoop 2.2, YARN does not support cores in container resource requests. Thus, when running against an earlier version, the numbers of cores given via command line arguments cannot be passed to YARN. Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
- The local directories used by Spark executors will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored.