aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/running-on-mesos.md3
-rw-r--r--docs/spark-standalone.md9
-rw-r--r--docs/submitting-applications.md14
3 files changed, 19 insertions, 7 deletions
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
index e3c8922404..bd046cfc18 100644
--- a/docs/running-on-mesos.md
+++ b/docs/running-on-mesos.md
@@ -127,7 +127,8 @@ val sc = new SparkContext(conf)
{% endhighlight %}
(You can also use [`spark-submit`](submitting-applications.html) and configure `spark.executor.uri`
-in the [conf/spark-defaults.conf](configuration.html#loading-default-configurations) file.)
+in the [conf/spark-defaults.conf](configuration.html#loading-default-configurations) file. Note
+that `spark-submit` currently only supports deploying the Spark driver in `client` mode for Mesos.)
When running a shell, the `spark.executor.uri` parameter is inherited from `SPARK_EXECUTOR_URI`, so
it does not need to be redundantly passed in as a system property.
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 3c1ce06083..f5c0f7cef8 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -235,11 +235,10 @@ You can also pass an option `--cores <numCores>` to control the number of cores
# Launching Compiled Spark Applications
-Spark supports two deploy modes: applications may run with the driver inside the client process or
-entirely inside the cluster. The
-[`spark-submit` script](submitting-applications.html) provides the
-most straightforward way to submit a compiled Spark application to the cluster in either deploy
-mode.
+The [`spark-submit` script](submitting-applications.html) provides the most straightforward way to
+submit a compiled Spark application to the cluster. For standalone clusters, Spark currently
+only supports deploying the driver inside the client process that is submitting the application
+(`client` deploy mode).
If your application is launched through Spark submit, then the application jar is automatically
distributed to all worker nodes. For any additional jars that your application depends on, you
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index d2864fe4c2..e05883072b 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -42,10 +42,22 @@ Some of the commonly used options are:
* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
-* `--deploy-mode`: Whether to deploy your driver program within the cluster or run it locally as an external client (either `cluster` or `client`)
+* `--deploy-mode`: Whether to deploy your driver on the worker nodes (`cluster`) or locally as an external client (`client`) (default: `client`)*
* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
* `application-arguments`: Arguments passed to the main method of your main class, if any
+*A common deployment strategy is to submit your application from a gateway machine that is
+physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster).
+In this setup, `client` mode is appropriate. In `client` mode, the driver is launched directly
+within the client `spark-submit` process, with the input and output of the application attached
+to the console. Thus, this mode is especially suitable for applications that involve the REPL
+(e.g. Spark shell).
+
+Alternatively, if your application is submitted from a machine far from the worker machines (e.g.
+locally on your laptop), it is common to use `cluster` mode to minimize network latency between
+the drivers and the executors. Note that `cluster` mode is currently not supported for standalone
+clusters, Mesos clusters, or python applications.
+
For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.