aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMatei Zaharia <matei.zaharia@gmail.com>2013-09-07 13:53:08 -0700
committerMatei Zaharia <matei.zaharia@gmail.com>2013-09-07 13:53:08 -0700
commitcfde85e395727a62f971a8c02c978dc9cd16a339 (patch)
treec42596966a354292741c0c7bef8c25c2f5917d83 /docs
parent4a7813a2479e2413275c23050afd242af1b7a1ba (diff)
parentbe1ee28ca630e663f54f0ca043e7f1877ccb3da8 (diff)
downloadspark-cfde85e395727a62f971a8c02c978dc9cd16a339.tar.gz
spark-cfde85e395727a62f971a8c02c978dc9cd16a339.tar.bz2
spark-cfde85e395727a62f971a8c02c978dc9cd16a339.zip
Merge pull request #901 from ooyala/2013-09/0.8-doc-changes
0.8 Doc changes for make-distribution.sh
Diffstat (limited to 'docs')
-rw-r--r--docs/index.md5
-rw-r--r--docs/spark-standalone.md20
2 files changed, 21 insertions, 4 deletions
diff --git a/docs/index.md b/docs/index.md
index 7d73929940..d3aacc629f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -46,6 +46,11 @@ Spark supports several options for deployment:
* [Apache Mesos](running-on-mesos.html)
* [Hadoop YARN](running-on-yarn.html)
+There is a script, `./make-distribution.sh`, which will create a binary distribution of Spark for deployment
+to any machine with only the Java runtime as a necessary dependency.
+Running the script creates a distribution directory in `dist/`, or the `-tgz` option to create a .tgz file.
+Check the script for additional options.
+
# A Note About Hadoop Versions
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 994a96f2c9..69e1291580 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -3,13 +3,21 @@ layout: global
title: Spark Standalone Mode
---
-In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided [deploy scripts](#cluster-launch-scripts). It is also possible to run these daemons on a single machine for testing.
+In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided [launch scripts](#cluster-launch-scripts). It is also possible to run these daemons on a single machine for testing.
+
+# Installing Spark Standalone to a Cluster
+
+The easiest way to deploy Spark is by running the `./make-distribution.sh` script to create a binary distribution.
+This distribution can be deployed to any machine with the Java runtime installed; there is no need to install Scala.
+
+The recommended procedure is to deploy and start the master on one node first, get the master spark URL,
+then modify `conf/spark-env.sh` in the `dist/` directory before deploying to all the other nodes.
# Starting a Cluster Manually
You can start a standalone master server by executing:
- ./spark-class org.apache.spark.deploy.master.Master
+ ./bin/start-master.sh
Once started, the master will print out a `spark://HOST:PORT` URL for itself, which you can use to connect workers to it,
or pass as the "master" argument to `SparkContext`. You can also find this URL on
@@ -22,7 +30,7 @@ Similarly, you can start one or more workers and connect them to the master via:
Once you have started a worker, look at the master's web UI ([http://localhost:8080](http://localhost:8080) by default).
You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS).
-Finally, the following configuration options can be passed to the master and worker:
+Finally, the following configuration options can be passed to the master and worker:
<table class="table">
<tr><th style="width:21%">Argument</th><th>Meaning</th></tr>
@@ -55,7 +63,7 @@ Finally, the following configuration options can be passed to the master and wor
# Cluster Launch Scripts
-To launch a Spark standalone cluster with the deploy scripts, you need to create a file called `conf/slaves` in your Spark directory, which should contain the hostnames of all the machines where you would like to start Spark workers, one per line. The master machine must be able to access each of the slave machines via password-less `ssh` (using a private key). For testing, you can just put `localhost` in this file.
+To launch a Spark standalone cluster with the launch scripts, you need to create a file called `conf/slaves` in your Spark directory, which should contain the hostnames of all the machines where you would like to start Spark workers, one per line. The master machine must be able to access each of the slave machines via password-less `ssh` (using a private key). For testing, you can just put `localhost` in this file.
Once you've set up this file, you can launch or stop your cluster with the following shell scripts, based on Hadoop's deploy scripts, and available in `SPARK_HOME/bin`:
@@ -134,6 +142,10 @@ To run an interactive Spark shell against the cluster, run the following command
MASTER=spark://IP:PORT ./spark-shell
+Note that if you are running spark-shell from one of the spark cluster machines, the `spark-shell` script will
+automatically set MASTER from the `SPARK_MASTER_IP` and `SPARK_MASTER_PORT` variables in `conf/spark-env.sh`.
+
+You can also pass an option `-c <numCores>` to control the number of cores that spark-shell uses on the cluster.
# Job Scheduling