From 6ffa9bb226ac9ceec4a34f0011c35d2d9710f8f8 Mon Sep 17 00:00:00 2001 From: Patrick Wendell Date: Sun, 29 Dec 2013 11:26:56 -0800 Subject: Documentation and adding supervise option --- docs/spark-standalone.md | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) (limited to 'docs') diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index b822265b5a..59adbce156 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -10,11 +10,7 @@ In addition to running on the Mesos or YARN cluster managers, Spark also provide # Installing Spark Standalone to a Cluster -The easiest way to deploy Spark is by running the `./make-distribution.sh` script to create a binary distribution. -This distribution can be deployed to any machine with the Java runtime installed; there is no need to install Scala. - -The recommended procedure is to deploy and start the master on one node first, get the master spark URL, -then modify `conf/spark-env.sh` in the `dist/` directory before deploying to all the other nodes. +To install Spark Standlone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or [build it yourself](index.html#building). # Starting a Cluster Manually @@ -150,6 +146,38 @@ automatically set MASTER from the `SPARK_MASTER_IP` and `SPARK_MASTER_PORT` vari You can also pass an option `-c ` to control the number of cores that spark-shell uses on the cluster. +# Launching Applications Inside the Cluster + +You may also run your application entirely inside of the cluster by submitting your application driver using the submission client. The syntax for submitting applications is as follows: + + + ./spark-class org.apache.spark.deploy.client.DriverClient launch + [client-options] \ + \ + [application-options] + + cluster-url: The URL of the master node. + application-jar-url: Path to a bundled jar including your application and all dependencies. + Accepts hdfs://, file://, and http:// paths. + main-class: The entry point for your application. + + Client Options: + --memory (amount of memory, in MB, allocated for your driver program) + --cores (number of cores allocated for your driver program) + --supervise (whether to automatically restart your driver on application or node failure) + +Keep in mind that your driver program will be executed on a remote worker machine. You can control the execution environment in the following ways: + + * _Environment variables_: These will be captured from the environment in which you launch the client and applied when launching the driver program. + * _Java options_: You can add java options by setting `SPARK_JAVA_OPTS` in the environment in which you launch the submission client. + * _Dependencies_: You'll still need to call `sc.addJar` inside of your driver program to add your application jar and any dependencies. If you submit a local application jar to the client (e.g one with a `file://` URL), it will be uploaded into the working directory of your driver program. Then, you can add it using `sc.addJar("jar-name.jar")`. + +Once you submit a driver program, it will appear in the cluster management UI at port 8080 and +be assigned an identifier. If you'd like to prematurely terminate the program, you can do so using +the same client: + + ./spark-class org.apache.spark.deploy.client.DriverClient kill + # Resource Scheduling The standalone cluster mode currently only supports a simple FIFO scheduler across applications. -- cgit v1.2.3