aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rwxr-xr-xconf/spark-env.sh.template29
-rw-r--r--docs/configuration.md26
-rwxr-xr-xrun7
3 files changed, 35 insertions, 27 deletions
diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index 64eacce8a2..6d71ec5691 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -1,21 +1,24 @@
#!/usr/bin/env bash
-# Set Spark environment variables for your site in this file. Some useful
-# variables to set are:
+# This file contains environment variables required to run Spark. Copy it as
+# spark-env.sh and edit that to configure Spark for your site. At a minimum,
+# the following two variables should be set:
# - MESOS_NATIVE_LIBRARY, to point to your Mesos native library (libmesos.so)
# - SCALA_HOME, to point to your Scala installation
+#
+# If using the standalone deploy mode, you can also set variables for it:
+# - SPARK_MASTER_IP, to bind the master to a different IP address
+# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
+# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
+# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
+# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
+#
+# Finally, Spark also relies on the following variables, but these can be set
+# on just the *master* (i.e. in your driver program), and will automatically
+# be propagated to workers:
+# - SPARK_MEM, to change the amount of memory used per node (this should
+# be in the same format as the JVM's -Xmx option, e.g. 300m or 1g)
# - SPARK_CLASSPATH, to add elements to Spark's classpath
# - SPARK_JAVA_OPTS, to add JVM options
-# - SPARK_MEM, to change the amount of memory used per node (this should
-# be in the same format as the JVM's -Xmx option, e.g. 300m or 1g).
# - SPARK_LIBRARY_PATH, to add extra search paths for native libraries.
-# Settings used by the scripts in the bin/ directory, apply to standalone mode only.
-# Note that the same worker settings apply to all of the workers.
-# - SPARK_MASTER_IP, to bind the master to a different ip address, for example a public one (Default: local ip address)
-# - SPARK_MASTER_PORT, to start the spark master on a different port (Default: 7077)
-# - SPARK_MASTER_WEBUI_PORT, to specify a different port for the Master WebUI (Default: 8080)
-# - SPARK_WORKER_PORT, to start the spark worker on a specific port (Default: random)
-# - SPARK_WORKER_CORES, to specify the number of cores to use (Default: all available cores)
-# - SPARK_WORKER_MEMORY, to specify how much memory to use, e.g. 1000M, 2G (Default: MAX(Available - 1024MB, 512MB))
-# - SPARK_WORKER_WEBUI_PORT, to specify a different port for the Worker WebUI (Default: 8081) \ No newline at end of file
diff --git a/docs/configuration.md b/docs/configuration.md
index 4270e50f47..08174878f2 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -5,33 +5,45 @@ title: Spark Configuration
Spark provides three main locations to configure the system:
-* The [`conf/spark-env.sh` script](#environment-variables-in-spark-envsh), in which you can set environment variables
- that affect how the JVM is launched, such as, most notably, the amount of memory per JVM.
+* [Environment variables](#environment-variables) for launching Spark workers, which can
+ be set either in your driver program or in the `conf/spark-env.sh` script.
* [Java system properties](#system-properties), which control internal configuration parameters and can be set either
programmatically (by calling `System.setProperty` *before* creating a `SparkContext`) or through the
`SPARK_JAVA_OPTS` environment variable in `spark-env.sh`.
* [Logging configuration](#configuring-logging), which is done through `log4j.properties`.
-# Environment Variables in spark-env.sh
+# Environment Variables
Spark determines how to initialize the JVM on worker nodes, or even on the local node when you run `spark-shell`,
by running the `conf/spark-env.sh` script in the directory where it is installed. This script does not exist by default
in the Git repository, but but you can create it by copying `conf/spark-env.sh.template`. Make sure that you make
the copy executable.
-Inside `spark-env.sh`, you can set the following environment variables:
+Inside `spark-env.sh`, you *must* set at least the following two environment variables:
* `SCALA_HOME` to point to your Scala installation.
* `MESOS_NATIVE_LIBRARY` if you are [running on a Mesos cluster](running-on-mesos.html).
-* `SPARK_MEM` to set the amount of memory used per node (this should be in the same format as the JVM's -Xmx option, e.g. `300m` or `1g`)
+
+In addition, there are four other variables that control execution. These can be set *either in `spark-env.sh`
+or in each job's driver program*, because they will automatically be propagated to workers from the driver.
+For a multi-user environment, we recommend setting the in the driver program instead of `spark-env.sh`, so
+that different user jobs can use different amounts of memory, JVM options, etc.
+
+* `SPARK_MEM` to set the amount of memory used per node (this should be in the same format as the
+ JVM's -Xmx option, e.g. `300m` or `1g`)
* `SPARK_JAVA_OPTS` to add JVM options. This includes any system properties that you'd like to pass with `-D`.
* `SPARK_CLASSPATH` to add elements to Spark's classpath.
* `SPARK_LIBRARY_PATH` to add search directories for native libraries.
-The most important things to set first will be `SCALA_HOME`, without which `spark-shell` cannot run, and `MESOS_NATIVE_LIBRARY`
-if running on Mesos. The next setting will probably be the memory (`SPARK_MEM`). Make sure you set it high enough to be able to run your job but lower than the total memory on the machines (leave at least 1 GB for the operating system).
+Note that if you do set these in `spark-env.sh`, they will override the values set by user programs, which
+is undesirable; you can choose to have `spark-env.sh` set them only if the user program hasn't, as follows:
+{% highlight bash %}
+if [ -z "$SPARK_MEM" ] ; then
+ SPARK_MEM="1g"
+fi
+{% endhighlight %}
# System Properties
diff --git a/run b/run
index cb1499c6f9..15db23bbe0 100755
--- a/run
+++ b/run
@@ -19,12 +19,6 @@ if [ -z "$SCALA_HOME" ]; then
exit 1
fi
-# If the user specifies a Mesos JAR, put it before our included one on the classpath
-MESOS_CLASSPATH=""
-if [ -n "$MESOS_JAR" ] ; then
- MESOS_CLASSPATH="$MESOS_JAR"
-fi
-
# Figure out how much memory to use per executor and set it as an environment
# variable so that our process sees it and can report it to Mesos
if [ -z "$SPARK_MEM" ] ; then
@@ -49,7 +43,6 @@ BAGEL_DIR="$FWDIR/bagel"
# Build up classpath
CLASSPATH="$SPARK_CLASSPATH"
-CLASSPATH+=":$MESOS_CLASSPATH"
CLASSPATH+=":$FWDIR/conf"
CLASSPATH+=":$CORE_DIR/target/scala-$SCALA_VERSION/classes"
if [ -n "$SPARK_TESTING" ] ; then