aboutsummaryrefslogtreecommitdiff
path: root/docs/cluster-overview.md
diff options
context:
space:
mode:
authorSandy Ryza <sandy@cloudera.com>2014-05-02 21:42:31 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-05-02 21:42:58 -0700
commit2b961d88079d7a3f9da63d5175d7b61f6dec762b (patch)
treec5da0cf1415096b062acad4a52cdfc6e8bdac923 /docs/cluster-overview.md
parent4bf24f7897e1c67ca5f96dec05480e571f05ee1d (diff)
downloadspark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.tar.gz
spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.tar.bz2
spark-2b961d88079d7a3f9da63d5175d7b61f6dec762b.zip
SPARK-1492. Update Spark YARN docs to use spark-submit
Author: Sandy Ryza <sandy@cloudera.com> Closes #601 from sryza/sandy-spark-1492 and squashes the following commits: 5df1634 [Sandy Ryza] Address additional comments from Patrick. be46d1f [Sandy Ryza] Address feedback from Marcelo and Patrick 867a3ea [Sandy Ryza] SPARK-1492. Update Spark YARN docs to use spark-submit
Diffstat (limited to 'docs/cluster-overview.md')
-rw-r--r--docs/cluster-overview.md15
1 files changed, 8 insertions, 7 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index b011679fed..79b0061e2c 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -86,7 +86,7 @@ the `--help` flag. Here are a few examples of common options:
--master local[8] \
my-app.jar
-# Run on a Spark cluster
+# Run on a Spark standalone cluster
./bin/spark-submit \
--class my.main.ClassName
--master spark://mycluster:7077 \
@@ -118,21 +118,22 @@ If you are ever unclear where configuration options are coming from. fine-graine
information can be printed by adding the `--verbose` option to `./spark-submit`.
### Advanced Dependency Management
-When using `./bin/spark-submit` jars will be automatically transferred to the cluster. For many
-users this is sufficient. However, advanced users can add jars by calling `addFile` or `addJar`
-on an existing SparkContext. This can be used to distribute JAR files (Java/Scala) or .egg and
-.zip libraries (Python) to executors. Spark uses the following URL scheme to allow different
+When using `./bin/spark-submit` the app jar along with any jars included with the `--jars` option
+will be automatically transferred to the cluster. `--jars` can also be used to distribute .egg and .zip
+libraries for Python to executors. Spark uses the following URL scheme to allow different
strategies for disseminating jars:
- **file:** - Absolute paths and `file:/` URIs are served by the driver's HTTP file server, and
- every executor pulls the file from the driver HTTP server
+ every executor pulls the file from the driver HTTP server.
- **hdfs:**, **http:**, **https:**, **ftp:** - these pull down files and JARs from the URI as expected
- **local:** - a URI starting with local:/ is expected to exist as a local file on each worker node. This
means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker,
or shared via NFS, GlusterFS, etc.
Note that JARs and files are copied to the working directory for each SparkContext on the executor nodes.
-Over time this can use up a significant amount of space and will need to be cleaned up.
+This can use up a significant amount of space over time and will need to be cleaned up. With YARN, cleanup
+is handled automatically, and with Spark standalone, automatic cleanup can be configured with the
+`spark.worker.cleanup.appDataTtl` property.
# Monitoring