1 files changed, 38 insertions, 47 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index b125eeb03c..1c0492efb3 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -5,50 +5,14 @@ title: Spark Configuration
 
 Spark provides three main locations to configure the system:
 
-* [Environment variables](#environment-variables) for launching Spark workers, which can
-  be set either in your driver program or in the `conf/spark-env.sh` script.
-* [Java system properties](#system-properties), which control internal configuration parameters and can be set either
-  programmatically (by calling `System.setProperty` *before* creating a `SparkContext`) or through the
-  `SPARK_JAVA_OPTS` environment variable in `spark-env.sh`.
+* [Java system properties](#system-properties), which control internal configuration parameters and can be set
+  either programmatically (by calling `System.setProperty` *before* creating a `SparkContext`) or through
+  JVM arguments.
+* [Environment variables](#environment-variables) for configuring per-machine settings such as the IP address,
+  which can be set in the `conf/spark-env.sh` script.
 * [Logging configuration](#configuring-logging), which is done through `log4j.properties`.
 
 
-# Environment Variables
-
-Spark determines how to initialize the JVM on worker nodes, or even on the local node when you run `spark-shell`,
-by running the `conf/spark-env.sh` script in the directory where it is installed. This script does not exist by default
-in the Git repository, but but you can create it by copying `conf/spark-env.sh.template`. Make sure that you make
-the copy executable.
-
-Inside `spark-env.sh`, you *must* set at least the following two variables:
-
-* `SCALA_HOME`, to point to your Scala installation, or `SCALA_LIBRARY_PATH` to point to the directory for Scala
-  library JARs (if you install Scala as a Debian or RPM package, there is no `SCALA_HOME`, but these libraries
-  are in a separate path, typically /usr/share/java; look for `scala-library.jar`).
-* `MESOS_NATIVE_LIBRARY`, if you are [running on a Mesos cluster](running-on-mesos.html).
-
-In addition, there are four other variables that control execution. These should be set *in the environment that
-launches the job's driver program* instead of `spark-env.sh`, because they will be automatically propagated to
-workers. Setting these per-job instead of in `spark-env.sh` ensures that different jobs can have different settings
-for these variables.
-
-* `SPARK_JAVA_OPTS`, to add JVM options. This includes any system properties that you'd like to pass with `-D`.
-* `SPARK_CLASSPATH`, to add elements to Spark's classpath.
-* `SPARK_LIBRARY_PATH`, to add search directories for native libraries.
-* `SPARK_MEM`, to set the amount of memory used per node. This should be in the same format as the
-   JVM's -Xmx option, e.g. `300m` or `1g`. Note that this option will soon be deprecated in favor of
-   the `spark.executor.memory` system property, so we recommend using that in new code.
-
-Beware that if you do set these variables in `spark-env.sh`, they will override the values set by user programs,
-which is undesirable; if you prefer, you can choose to have `spark-env.sh` set them only if the user program
-hasn't, as follows:
-
-{% highlight bash %}
-if [ -z "$SPARK_JAVA_OPTS" ] ; then
-  SPARK_JAVA_OPTS="-verbose:gc"
-fi
-{% endhighlight %}
-
 # System Properties
 
 To set a system property for configuring Spark, you need to either pass it with a -D flag to the JVM (for example `java -Dspark.cores.max=5 MyProgram`) or call `System.setProperty` in your code *before* creating your Spark context, as follows:
@@ -67,7 +31,7 @@ there are at least five properties that you will commonly want to control:
   <td>spark.executor.memory</td>
   <td>512m</td>
   <td>
-    Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. `512m`, `2g`).
+    Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. <code>512m</code>, <code>2g</code>).
   </td>
 </tr>
 <tr>
@@ -78,7 +42,7 @@ there are at least five properties that you will commonly want to control:
     in serialized form. The default of Java serialization works with any Serializable Java object but is
     quite slow, so we recommend <a href="tuning.html">using <code>spark.KryoSerializer</code>
     and configuring Kryo serialization</a> when speed is necessary. Can be any subclass of
-    <a href="api/core/index.html#spark.Serializer"><code>spark.Serializer</code></a>).
+    <a href="api/core/index.html#spark.Serializer"><code>spark.Serializer</code></a>.
   </td>
 </tr>
 <tr>
@@ -87,7 +51,7 @@ there are at least five properties that you will commonly want to control:
   <td>
     If you use Kryo serialization, set this class to register your custom classes with Kryo.
     You need to set it to a class that extends
-    <a href="api/core/index.html#spark.KryoRegistrator"><code>spark.KryoRegistrator</code></a>).
+    <a href="api/core/index.html#spark.KryoRegistrator"><code>spark.KryoRegistrator</code></a>.
     See the <a href="tuning.html#data-serialization">tuning guide</a> for more details.
   </td>
 </tr>
@@ -97,7 +61,7 @@ there are at least five properties that you will commonly want to control:
   <td>
     Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored
     on disk. This should be on a fast, local disk in your system. It can also be a comma-separated
-    list of multiple directories.
+    list of multiple directories on different disks.
   </td>
 </tr>
 <tr>
@@ -106,7 +70,8 @@ there are at least five properties that you will commonly want to control:
   <td>
     When running on a <a href="spark-standalone.html">standalone deploy cluster</a> or a
     <a href="running-on-mesos.html#mesos-run-modes">Mesos cluster in "coarse-grained"
-    sharing mode</a>, how many CPU cores to request at most. The default will use all available cores.
+    sharing mode</a>, how many CPU cores to request at most. The default will use all available cores
+    offered by the cluster manager.
   </td>
 </tr>
 </table>
@@ -321,7 +286,7 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td>spark.cleaner.ttl</td>
-  <td>(disable)</td>
+  <td>(infinite)</td>
   <td>
     Duration (seconds) of how long Spark will remember any metadata (stages generated, tasks generated, etc.).
     Periodic cleanups will ensure that metadata older than this duration will be forgetten. This is
@@ -347,6 +312,32 @@ Apart from these, the following properties are also available, and may be useful
 
 </table>
 
+# Environment Variables
+
+Certain Spark settings can also be configured through environment variables, which are read from the `conf/spark-env.sh`
+script in the directory where Spark is installed. These variables are meant to be for machine-specific settings, such
+as library search paths. While Java system properties can also be set here, for application settings, we recommend setting
+these properties within the application instead of in `spark-env.sh` so that different applications can use different
+settings.
+
+Note that `conf/spark-env.sh` does not exist by default when Spark is installed. However, you can copy
+`conf/spark-env.sh.template` to create it. Make sure you make the copy executable.
+
+The following variables can be set in `spark-env.sh`:
+
+* `SPARK_LOCAL_IP`, to configure which IP address of the machine to bind to.
+* `SPARK_LIBRARY_PATH`, to add search directories for native libraries.
+* `SPARK_CLASSPATH`, to add elements to Spark's classpath that you want to be present for _all_ applications.
+   Note that applications can also add dependencies for themselves through `SparkContext.addJar` -- we recommend
+   doing that when possible.
+* `SPARK_JAVA_OPTS`, to add JVM options. This includes Java options like garbage collector settings and any system
+   properties that you'd like to pass with `-D` (e.g., `-Dspark.local.dir=/disk1,/disk2`). 
+* Options for the Spark [standalone cluster scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores
+  to use on each machine and maximum memory.
+
+Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example, you might
+compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface.
+
 # Configuring Logging
 
 Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a `log4j.properties`