aboutsummaryrefslogtreecommitdiff
path: root/docs/scala-programming-guide.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@eecs.berkeley.edu>2012-09-25 19:31:07 -0700
committerMatei Zaharia <matei@eecs.berkeley.edu>2012-09-25 19:31:07 -0700
commit56c90485fd947d75bbe7aac81593ba42cfe56821 (patch)
tree99c5bc617a350a408abfa62f612fc14c807ad1e7 /docs/scala-programming-guide.md
parent1821bf1f1f50e5eb1c7adf9d010ecc392b1adad5 (diff)
downloadspark-56c90485fd947d75bbe7aac81593ba42cfe56821.tar.gz
spark-56c90485fd947d75bbe7aac81593ba42cfe56821.tar.bz2
spark-56c90485fd947d75bbe7aac81593ba42cfe56821.zip
More updates to documentation
Diffstat (limited to 'docs/scala-programming-guide.md')
-rw-r--r--docs/scala-programming-guide.md26
1 files changed, 16 insertions, 10 deletions
diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md
index 94d304e23a..ad06c30dbf 100644
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@@ -28,27 +28,33 @@ This is done through the following constructor:
new SparkContext(master, jobName, [sparkHome], [jars])
{% endhighlight %}
-The `master` parameter is a string specifying a [Mesos]({{HOME_PATH}}running-on-mesos.html) cluster to connect to, or a special "local" string to run in local mode, as described below. `jobName` is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running on Mesos, as described later.
+The `master` parameter is a string specifying a [Mesos]({{HOME_PATH}}running-on-mesos.html) cluster to connect to, or a special "local" string to run in local mode, as described below. `jobName` is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.
In the Spark interpreter, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable. For example, run `MASTER=local[4] ./spark-shell` to run locally with four cores.
-### Master Names
+### Master URLs
-The master name can be in one of three formats:
+The master URL passed to Spark can be in one of the following formats:
<table class="table">
-<tr><th>Master Name</th><th>Meaning</th></tr>
+<tr><th>Master URL</th><th>Meaning</th></tr>
<tr><td> local </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr>
-<tr><td> local[K] </td><td> Run Spark locally with K worker threads (which should be set to the number of cores on your machine). </td></tr>
-<tr><td> HOST:PORT </td><td> Connect Spark to the given (Mesos)({{HOME_PATH}}running-on-mesos.html) master to run on a cluster. The host parameter is the hostname of the Mesos master. The port must be whichever one the master is configured to use, which is 5050 by default.
-<br /><br />
-<strong>NOTE:</strong> In earlier versions of Mesos (the <code>old-mesos</code> branch of Spark), you need to use master@HOST:PORT.
+<tr><td> local[K] </td><td> Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
+</td></tr>
+<tr><td> spark://HOST:PORT </td><td> Connect to the given <a href="{{HOME_PATH}}spark-standalone.html">Spark standalone
+ cluster</a> master. The port must be whichever one your master is configured to use, which is 7077 by default.
+</td></tr>
+<tr><td> mesos://HOST:PORT </td><td> Connect Spark to the given <a href="{{HOME_PATH}}running-on-mesos.html">Mesos</a> cluster.
+ The host parameter is the hostname of the Mesos master. The port must be whichever one the master is configured to use,
+ which is 5050 by default.
</td></tr>
</table>
-### Deploying to a Cluster
+For running on YARN, Spark launches an instance of the standalone deploy cluster within YARN; see [running on YARN]({{HOME_PATH}}running-on-yarn.html) for details.
+
+### Running on a Cluster
-If you want to run your job on a cluster, you will need to specify the two optional parameters:
+If you want to run your job on a cluster, you will need to specify the two optional parameters to `SparkContext` to let it find your code:
* `sparkHome`: The path at which Spark is installed on your worker machines (it should be the same on all of them).
* `jars`: A list of JAR files on the local machine containing your job's code and any dependencies, which Spark will deploy to all the worker nodes. You'll need to package your job into a set of JARs using your build system. For example, if you're using SBT, the [sbt-assembly](https://github.com/sbt/sbt-assembly) plugin is a good way to make a single JAR with your code and dependencies.