aboutsummaryrefslogtreecommitdiff
path: root/docs/submitting-applications.md
diff options
context:
space:
mode:
authorAndrew Or <andrewor14@gmail.com>2014-06-27 16:11:31 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-06-27 16:11:31 -0700
commitf17510e371dfbeaada3c72b884d70c36503ea30a (patch)
tree2a134954b34cdb3a1bf9b3e8dd7d251e9ccef28f /docs/submitting-applications.md
parent21e0f77b6321590ed86223a60cdb8ae08ea4057f (diff)
downloadspark-f17510e371dfbeaada3c72b884d70c36503ea30a.tar.gz
spark-f17510e371dfbeaada3c72b884d70c36503ea30a.tar.bz2
spark-f17510e371dfbeaada3c72b884d70c36503ea30a.zip
[SPARK-2259] Fix highly misleading docs on cluster / client deploy modes
The existing docs are highly misleading. For standalone mode, for example, it encourages the user to use standalone-cluster mode, which is not officially supported. The safeguards have been added in Spark submit itself to prevent bad documentation from leading users down the wrong path in the future. This PR is prompted by countless headaches users of Spark have run into on the mailing list. Author: Andrew Or <andrewor14@gmail.com> Closes #1200 from andrewor14/submit-docs and squashes the following commits: 5ea2460 [Andrew Or] Rephrase cluster vs client explanation c827f32 [Andrew Or] Clarify spark submit messages 9f7ed8f [Andrew Or] Clarify client vs cluster deploy mode + add safeguards
Diffstat (limited to 'docs/submitting-applications.md')
-rw-r--r--docs/submitting-applications.md14
1 files changed, 13 insertions, 1 deletions
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index d2864fe4c2..e05883072b 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -42,10 +42,22 @@ Some of the commonly used options are:
* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
-* `--deploy-mode`: Whether to deploy your driver program within the cluster or run it locally as an external client (either `cluster` or `client`)
+* `--deploy-mode`: Whether to deploy your driver on the worker nodes (`cluster`) or locally as an external client (`client`) (default: `client`)*
* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
* `application-arguments`: Arguments passed to the main method of your main class, if any
+*A common deployment strategy is to submit your application from a gateway machine that is
+physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster).
+In this setup, `client` mode is appropriate. In `client` mode, the driver is launched directly
+within the client `spark-submit` process, with the input and output of the application attached
+to the console. Thus, this mode is especially suitable for applications that involve the REPL
+(e.g. Spark shell).
+
+Alternatively, if your application is submitted from a machine far from the worker machines (e.g.
+locally on your laptop), it is common to use `cluster` mode to minimize network latency between
+the drivers and the executors. Note that `cluster` mode is currently not supported for standalone
+clusters, Mesos clusters, or python applications.
+
For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.