aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
authorJim Lim <jim@quixey.com>2014-12-03 11:16:02 -0800
committerAndrew Or <andrew@databricks.com>2014-12-03 11:16:29 -0800
commita975dc32799bb8a14f9e1c76defaaa7cfbaf8b53 (patch)
tree4d360d83bf07ae47d9b12962c47431d7611568c9 /docs/running-on-yarn.md
parentd00542987ed80635782dcc826fc0bdbf434fff10 (diff)
downloadspark-a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53.tar.gz
spark-a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53.tar.bz2
spark-a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53.zip
SPARK-2624 add datanucleus jars to the container in yarn-cluster
If `spark-submit` finds the datanucleus jars, it adds them to the driver's classpath, but does not add it to the container. This patch modifies the yarn deployment class to copy all `datanucleus-*` jars found in `[spark-home]/libs` to the container. Author: Jim Lim <jim@quixey.com> Closes #3238 from jimjh/SPARK-2624 and squashes the following commits: 3633071 [Jim Lim] SPARK-2624 update documentation and comments fe95125 [Jim Lim] SPARK-2624 keep java imports together 6c31fe0 [Jim Lim] SPARK-2624 update documentation 6690fbf [Jim Lim] SPARK-2624 add tests d28d8e9 [Jim Lim] SPARK-2624 add spark.yarn.datanucleus.dir option 84e6cba [Jim Lim] SPARK-2624 add datanucleus jars to the container in yarn-cluster
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md15
1 files changed, 15 insertions, 0 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index dfe2db4b3f..45e219e0c1 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -132,6 +132,21 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
The maximum number of threads to use in the application master for launching executor containers.
</td>
</tr>
+<tr>
+ <td><code>spark.yarn.datanucleus.dir</code></td>
+ <td>$SPARK_HOME/lib</td>
+ <td>
+ The location of the DataNucleus jars, in case overriding the default location is desired.
+ By default, Spark on YARN will use the DataNucleus jars installed at
+ <code>$SPARK_HOME/lib</code>, but the jars can also be in a world-readable location on HDFS.
+ This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an
+ application runs. To point to a directory on HDFS, for example, set this configuration to
+ "hdfs:///some/path".
+
+ This is required because the datanucleus jars cannot be packaged into the
+ assembly jar due to metadata conflicts (involving <code>plugin.xml</code>.)
+ </td>
+</tr>
</table>
# Launching Spark on YARN