SPARK-1492. Update Spark YARN docs to use spark-submit

Author: Sandy Ryza <sandy@cloudera.com> Closes #601 from sryza/sandy-spark-1492 and squashes the following commits: 5df1634 [Sandy Ryza] Address additional comments from Patrick. be46d1f [Sandy Ryza] Address feedback from Marcelo and Patrick 867a3ea [Sandy Ryza] SPARK-1492. Update Spark YARN docs to use spark-submit
author: Sandy Ryza <sandy@cloudera.com> 2014-05-02 21:42:31 -0700
committer: Patrick Wendell <pwendell@gmail.com> 2014-05-02 21:42:58 -0700
commit: 2b961d88079d7a3f9da63d5175d7b61f6dec762b (patch)
tree: c5da0cf1415096b062acad4a52cdfc6e8bdac923 /docs/cluster-overview.md
parent: 4bf24f7897e1c67ca5f96dec05480e571f05ee1d (diff)
download: spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.tar.gz
spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.tar.bz2
spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.zip
1 files changed, 8 insertions, 7 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index b011679fed..79b0061e2c 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -86,7 +86,7 @@ the `--help` flag. Here are a few examples of common options:
   --master local[8] \
   my-app.jar
 
-# Run on a Spark cluster
+# Run on a Spark standalone cluster
 ./bin/spark-submit \
   --class my.main.ClassName
   --master spark://mycluster:7077 \
@@ -118,21 +118,22 @@ If you are ever unclear where configuration options are coming from. fine-graine
 information can be printed by adding the `--verbose` option to `./spark-submit`.
 
 ### Advanced Dependency Management
-When using `./bin/spark-submit` jars will be automatically transferred to the cluster. For many
-users this is sufficient. However, advanced users can add jars by calling `addFile` or `addJar`
-on an existing SparkContext. This can be used to distribute JAR files (Java/Scala) or .egg and
-.zip libraries (Python) to executors. Spark uses the following URL scheme to allow different
+When using `./bin/spark-submit` the app jar along with any jars included with the `--jars` option
+will be automatically transferred to the cluster. `--jars` can also be used to distribute .egg and .zip
+libraries for Python to executors. Spark uses the following URL scheme to allow different
 strategies for disseminating jars:
 
 - **file:** - Absolute paths and `file:/` URIs are served by the driver's HTTP file server, and
-  every executor pulls the file from the driver HTTP server
+  every executor pulls the file from the driver HTTP server.
 - **hdfs:**, **http:**, **https:**, **ftp:** - these pull down files and JARs from the URI as expected
 - **local:** - a URI starting with local:/ is expected to exist as a local file on each worker node.  This
   means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker,
   or shared via NFS, GlusterFS, etc.
 
 Note that JARs and files are copied to the working directory for each SparkContext on the executor nodes.
-Over time this can use up a significant amount of space and will need to be cleaned up.
+This can use up a significant amount of space over time and will need to be cleaned up. With YARN, cleanup
+is handled automatically, and with Spark standalone, automatic cleanup can be configured with the
+`spark.worker.cleanup.appDataTtl` property.
 
 # Monitoring
author	Sandy Ryza <sandy@cloudera.com>	2014-05-02 21:42:31 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2014-05-02 21:42:58 -0700
commit	2b961d88079d7a3f9da63d5175d7b61f6dec762b (patch)
tree	c5da0cf1415096b062acad4a52cdfc6e8bdac923 /docs/cluster-overview.md
parent	4bf24f7897e1c67ca5f96dec05480e571f05ee1d (diff)
download	spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.tar.gz spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.tar.bz2 spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.zip