SPARK-1126. spark-app preliminary

This is a starting version of the spark-app script for running compiled binaries against Spark. It still needs tests and some polish. The only testing I've done so far has been using it to launch jobs in yarn-standalone mode against a pseudo-distributed cluster. This leaves out the changes required for launching python scripts. I think it might be best to save those for another JIRA/PR (while keeping to the design so that they won't require backwards-incompatible changes). Author: Sandy Ryza <sandy@cloudera.com> Closes #86 from sryza/sandy-spark-1126 and squashes the following commits: d428d85 [Sandy Ryza] Commenting, doc, and import fixes from Patrick's comments e7315c6 [Sandy Ryza] Fix failing tests 34de899 [Sandy Ryza] Change --more-jars to --jars and fix docs 299ddca [Sandy Ryza] Fix scalastyle a94c627 [Sandy Ryza] Add newline at end of SparkSubmit 04bc4e2 [Sandy Ryza] SPARK-1126. spark-submit script
author: Sandy Ryza <sandy@cloudera.com> 2014-03-29 14:41:36 -0700
committer: Patrick Wendell <pwendell@gmail.com> 2014-03-29 14:41:36 -0700
commit: 1617816090e7b20124a512a43860a21232ebf511 (patch)
tree: cb6e45d21cb59edd81ab3bc29b9e00ab034bb90d /docs/cluster-overview.md
parent: 3738f24421d6f3bd10e5ef9ebfc10f702a5cb7ac (diff)
download: spark-1617816090e7b20124a512a43860a21232ebf511.tar.gz
spark-1617816090e7b20124a512a43860a21232ebf511.tar.bz2
spark-1617816090e7b20124a512a43860a21232ebf511.zip
1 files changed, 50 insertions, 0 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index a555a7b502..b69e3416fb 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -50,6 +50,50 @@ The system currently supports three cluster managers:
 In addition, Spark's [EC2 launch scripts](ec2-scripts.html) make it easy to launch a standalone
 cluster on Amazon EC2.
 
+# Launching Applications
+
+The recommended way to launch a compiled Spark application is through the spark-submit script (located in the
+bin directory), which takes care of setting up the classpath with Spark and its dependencies, as well as
+provides a layer over the different cluster managers and deploy modes that Spark supports.  It's usage is
+
+  spark-submit `<jar>` `<options>`
+
+Where options are any of:
+
+- **\--class** - The main class to run.
+- **\--master** - The URL of the cluster manager master, e.g. spark://host:port, mesos://host:port, yarn,
+  or local.
+- **\--deploy-mode** - "client" to run the driver in the client process or "cluster" to run the driver in
+  a process on the cluster.  For Mesos, only "client" is supported.
+- **\--executor-memory** - Memory per executor (e.g. 1000M, 2G).
+- **\--executor-cores** - Number of cores per executor. (Default: 2)
+- **\--driver-memory** - Memory for driver (e.g. 1000M, 2G)
+- **\--name** - Name of the application.
+- **\--arg** - Argument to be passed to the application's main class. This option can be specified
+  multiple times to pass multiple arguments.
+- **\--jars** - A comma-separated list of local jars to include on the driver classpath and that
+  SparkContext.addJar will work with. Doesn't work on standalone with 'cluster' deploy mode.
+
+The following currently only work for Spark standalone with cluster deploy mode:
+
+- **\--driver-cores** - Cores for driver (Default: 1).
+- **\--supervise** - If given, restarts the driver on failure.
+
+The following only works for Spark standalone and Mesos only:
+
+- **\--total-executor-cores** - Total cores for all executors.
+
+The following currently only work for YARN:
+
+- **\--queue** - The YARN queue to place the application in.
+- **\--files** - Comma separated list of files to be placed in the working dir of each executor.
+- **\--archives** - Comma separated list of archives to be extracted into the working dir of each
+  executor.
+- **\--num-executors** - Number of executors (Default: 2).
+
+The master and deploy mode can also be set with the MASTER and DEPLOY_MODE environment variables.
+Values for these options passed via command line will override the environment variables.
+
 # Shipping Code to the Cluster
 
 The recommended way to ship your code to the cluster is to pass it through SparkContext's constructor,
@@ -103,6 +147,12 @@ The following table summarizes terms you'll see used to refer to cluster concept
       <td>An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)</td>
     </tr>
     <tr>
+      <td>Deploy mode</td>
+      <td>Distinguishes where the driver process runs. In "cluster" mode, the framework launches
+        the driver inside of the cluster. In "client" mode, the submitter launches the driver
+        outside of the cluster.</td>
+    <tr>
+    <tr>
       <td>Worker node</td>
       <td>Any node that can run application code in the cluster</td>
     </tr>
author	Sandy Ryza <sandy@cloudera.com>	2014-03-29 14:41:36 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2014-03-29 14:41:36 -0700
commit	1617816090e7b20124a512a43860a21232ebf511 (patch)
tree	cb6e45d21cb59edd81ab3bc29b9e00ab034bb90d /docs/cluster-overview.md
parent	3738f24421d6f3bd10e5ef9ebfc10f702a5cb7ac (diff)
download	spark-1617816090e7b20124a512a43860a21232ebf511.tar.gz spark-1617816090e7b20124a512a43860a21232ebf511.tar.bz2 spark-1617816090e7b20124a512a43860a21232ebf511.zip