diff options
author | Patrick Wendell <pwendell@gmail.com> | 2013-12-29 11:26:56 -0800 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2013-12-29 11:26:56 -0800 |
commit | 6ffa9bb226ac9ceec4a34f0011c35d2d9710f8f8 (patch) | |
tree | dfb8e28a46701b6b1af03437dcc2b3ef2ecb83d3 /docs | |
parent | 35f6dc252a8961189837e79914f305d0745a8792 (diff) | |
download | spark-6ffa9bb226ac9ceec4a34f0011c35d2d9710f8f8.tar.gz spark-6ffa9bb226ac9ceec4a34f0011c35d2d9710f8f8.tar.bz2 spark-6ffa9bb226ac9ceec4a34f0011c35d2d9710f8f8.zip |
Documentation and adding supervise option
Diffstat (limited to 'docs')
-rw-r--r-- | docs/spark-standalone.md | 38 |
1 files changed, 33 insertions, 5 deletions
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index b822265b5a..59adbce156 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -10,11 +10,7 @@ In addition to running on the Mesos or YARN cluster managers, Spark also provide # Installing Spark Standalone to a Cluster -The easiest way to deploy Spark is by running the `./make-distribution.sh` script to create a binary distribution. -This distribution can be deployed to any machine with the Java runtime installed; there is no need to install Scala. - -The recommended procedure is to deploy and start the master on one node first, get the master spark URL, -then modify `conf/spark-env.sh` in the `dist/` directory before deploying to all the other nodes. +To install Spark Standlone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or [build it yourself](index.html#building). # Starting a Cluster Manually @@ -150,6 +146,38 @@ automatically set MASTER from the `SPARK_MASTER_IP` and `SPARK_MASTER_PORT` vari You can also pass an option `-c <numCores>` to control the number of cores that spark-shell uses on the cluster. +# Launching Applications Inside the Cluster + +You may also run your application entirely inside of the cluster by submitting your application driver using the submission client. The syntax for submitting applications is as follows: + + + ./spark-class org.apache.spark.deploy.client.DriverClient launch + [client-options] \ + <cluster-url> <application-jar-url> <main-class> \ + [application-options] + + cluster-url: The URL of the master node. + application-jar-url: Path to a bundled jar including your application and all dependencies. + Accepts hdfs://, file://, and http:// paths. + main-class: The entry point for your application. + + Client Options: + --memory <count> (amount of memory, in MB, allocated for your driver program) + --cores <count> (number of cores allocated for your driver program) + --supervise (whether to automatically restart your driver on application or node failure) + +Keep in mind that your driver program will be executed on a remote worker machine. You can control the execution environment in the following ways: + + * _Environment variables_: These will be captured from the environment in which you launch the client and applied when launching the driver program. + * _Java options_: You can add java options by setting `SPARK_JAVA_OPTS` in the environment in which you launch the submission client. + * _Dependencies_: You'll still need to call `sc.addJar` inside of your driver program to add your application jar and any dependencies. If you submit a local application jar to the client (e.g one with a `file://` URL), it will be uploaded into the working directory of your driver program. Then, you can add it using `sc.addJar("jar-name.jar")`. + +Once you submit a driver program, it will appear in the cluster management UI at port 8080 and +be assigned an identifier. If you'd like to prematurely terminate the program, you can do so using +the same client: + + ./spark-class org.apache.spark.deploy.client.DriverClient kill <driverId> + # Resource Scheduling The standalone cluster mode currently only supports a simple FIFO scheduler across applications. |