3 files changed, 72 insertions, 9 deletions
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 84749fda4e..6138a8f628 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -97,6 +97,7 @@
                             <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
                             <ul class="dropdown-menu">
                                 <li><a href="configuration.html">Configuration</a></li>
+                                <li><a href="monitoring.html">Monitoring</a></li>
                                 <li><a href="tuning.html">Tuning Guide</a></li>
                                 <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
                                 <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
diff --git a/docs/monitoring.md b/docs/monitoring.md
new file mode 100644
index 0000000000..0ec987107c
--- /dev/null
+++ b/docs/monitoring.md
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Monitoring and Instrumentation
+---
+
+There are several ways to monitor the progress of Spark jobs.
+
+# Web Interfaces
+When a SparkContext is initialized, it launches a web server (by default at port 3030) which 
+displays useful information. This includes a list of active and completed scheduler stages, 
+a summary of RDD blocks and partitions, and environmental information. If multiple SparkContexts
+are running on the same host, they will bind to succesive ports beginning with 3030 (3031, 3032, 
+etc).
+
+Spark's Standlone Mode scheduler also has its own 
+[web interface](spark-standalone.html#monitoring-and-logging). 
+
+# Spark Metrics
+Spark has a configurable metrics system based on the 
+[Coda Hale Metrics Library](http://metrics.codahale.com/). 
+This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV 
+files. The metrics system is configured via a configuration file that Spark expects to be present 
+at `$SPARK_HOME/conf/metrics.conf`. A custom file location can be specified via the 
+`spark.metrics.conf` Java system property. Spark's metrics are decoupled into different 
+_instances_ corresponding to Spark components. Within each instance, you can configure a 
+set of sinks to which metrics are reported. The following instances are currently supported:
+
+* `master`: The Spark standalone master process.
+* `applications`: A component within the master which reports on various applications.
+* `worker`: A Spark standalone worker process.
+* `executor`: A Spark executor.
+* `driver`: The Spark driver process (the process in which your SparkContext is created).
+
+The syntax of the metrics configuration file is defined in an example configuration file, 
+`$SPARK_HOME/conf/metrics.conf.template`.
+
+# Advanced Instrumentation
+Several external tools can be used to help profile the performance of Spark jobs:
+
+* Cluster-wide monitoring tools, such as [Ganglia](http://ganglia.sourceforge.net/), can provide 
+insight into overall cluster utilization and resource bottlenecks. For instance, a Ganglia 
+dashboard can quickly reveal whether a particular workload is disk bound, network bound, or 
+CPU bound.
+* OS profiling tools such as [dstat](http://dag.wieers.com/home-made/dstat/), 
+[iostat](http://linux.die.net/man/1/iostat), and [iotop](http://linux.die.net/man/1/iotop) 
+can provide fine-grained profiling on individual nodes.
+* JVM utilities such as `jstack` for providing stack traces, `jmap` for creating heap-dumps, 
+`jstat` for reporting time-series statistics and `jconsole` for visually exploring various JVM 
+properties are useful for those comfortable with JVM internals.
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 93421efcbc..c611db0af4 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -42,7 +42,7 @@ This would be used to connect to the cluster, write to the dfs and submit jobs t
 
 The command to launch the YARN Client is as follows:
 
-    SPARK_JAR=<SPARK_YARN_JAR_FILE> ./spark-class org.apache.spark.deploy.yarn.Client \
+    SPARK_JAR=<SPARK_ASSEMBLY_JAR_FILE> ./spark-class org.apache.spark.deploy.yarn.Client \
       --jar <YOUR_APP_JAR_FILE> \
       --class <APP_MAIN_CLASS> \
       --args <APP_MAIN_ARGUMENTS> \
@@ -54,14 +54,27 @@ The command to launch the YARN Client is as follows:
 
 For example:
 
-    SPARK_JAR=./yarn/target/spark-yarn-assembly-{{site.SPARK_VERSION}}.jar ./spark-class org.apache.spark.deploy.yarn.Client \
-      --jar examples/target/scala-{{site.SCALA_VERSION}}/spark-examples_{{site.SCALA_VERSION}}-{{site.SPARK_VERSION}}.jar \
-      --class org.apache.spark.examples.SparkPi \
-      --args yarn-standalone \
-      --num-workers 3 \
-      --master-memory 4g \
-      --worker-memory 2g \
-      --worker-cores 1
+    # Build the Spark assembly JAR and the Spark examples JAR
+    $ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true ./sbt/sbt assembly
+
+    # Configure logging
+    $ cp conf/log4j.properties.template conf/log4j.properties
+
+    # Submit Spark's ApplicationMaster to YARN's ResourceManager, and instruct Spark to run the SparkPi example
+    $ SPARK_JAR=./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly-{{site.SPARK_VERSION}}-hadoop2.0.5-alpha.jar \
+        ./spark-class org.apache.spark.deploy.yarn.Client \
+          --jar examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \
+          --class org.apache.spark.examples.SparkPi \
+          --args yarn-standalone \
+          --num-workers 3 \
+          --master-memory 4g \
+          --worker-memory 2g \
+          --worker-cores 1
+
+    # Examine the output (replace $YARN_APP_ID in the following with the "application identifier" output by the previous command)
+    # (Note: YARN_APP_LOGS_DIR is usually /tmp/logs or $HADOOP_HOME/logs/userlogs depending on the Hadoop version.)
+    $ cat $YARN_APP_LOGS_DIR/$YARN_APP_ID/container*_000001/stdout
+    Pi is roughly 3.13794
 
 The above starts a YARN Client programs which periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running.