aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMichael Gummelt <mgummelt@mesosphere.io>2016-03-31 12:06:16 -0700
committerAndrew Or <andrew@databricks.com>2016-03-31 12:06:21 -0700
commit4d93b653f7294698526674950d3dc303691260f8 (patch)
tree5d34ad36a03ecaa6410864b6abadb80c25a5c34e /docs
parent8a333d2da859fd593bda183413630bc3757529c9 (diff)
downloadspark-4d93b653f7294698526674950d3dc303691260f8.tar.gz
spark-4d93b653f7294698526674950d3dc303691260f8.tar.bz2
spark-4d93b653f7294698526674950d3dc303691260f8.zip
[Docs] Update monitoring.md to accurately describe the history server
It looks like the docs were recently updated to reflect the History Server's support for incomplete applications, but they still had wording that suggested only completed applications were viewable. This fixes that. My editor also introduced several whitespace removal changes, that I hope are OK, as text files shouldn't have trailing whitespace. To verify they're purely whitespace changes, add `&w=1` to your browser address. If this isn't acceptable, let me know and I'll update the PR. I also didn't think this required a JIRA. Let me know if I should create one. Not tested Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #12045 from mgummelt/update-history-docs.
Diffstat (limited to 'docs')
-rw-r--r--docs/monitoring.md58
1 files changed, 29 insertions, 29 deletions
diff --git a/docs/monitoring.md b/docs/monitoring.md
index c139e1cb5a..32d2e02e93 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -8,7 +8,7 @@ There are several ways to monitor Spark applications: web UIs, metrics, and exte
# Web Interfaces
-Every SparkContext launches a web UI, by default on port 4040, that
+Every SparkContext launches a web UI, by default on port 4040, that
displays useful information about the application. This includes:
* A list of scheduler stages and tasks
@@ -32,19 +32,19 @@ Spark's Standalone Mode cluster manager also has its own
the course of its lifetime, then the Standalone master's web UI will automatically re-render the
application's UI after the application has finished.
-If Spark is run on Mesos or YARN, it is still possible to reconstruct the UI of a finished
+If Spark is run on Mesos or YARN, it is still possible to construct the UI of an
application through Spark's history server, provided that the application's event logs exist.
You can start the history server by executing:
./sbin/start-history-server.sh
This creates a web interface at `http://<server-url>:18080` by default, listing incomplete
-and completed applications and attempts, and allowing them to be viewed
+and completed applications and attempts.
When using the file-system provider class (see `spark.history.provider` below), the base logging
directory must be supplied in the `spark.history.fs.logDirectory` configuration option,
and should contain sub-directories that each represents an application's event logs.
-
+
The spark jobs themselves must be configured to log events, and to log them to the same shared,
writeable directory. For example, if the server was configured with a log directory of
`hdfs://namenode/shared/spark-logs`, then the client-side options would be:
@@ -53,7 +53,7 @@ writeable directory. For example, if the server was configured with a log direct
spark.eventLog.enabled true
spark.eventLog.dir hdfs://namenode/shared/spark-logs
```
-
+
The history server can be configured as follows:
### Environment Variables
@@ -135,9 +135,9 @@ The history server can be configured as follows:
<td>false</td>
<td>
Indicates whether the history server should use kerberos to login. This is required
- if the history server is accessing HDFS files on a secure Hadoop cluster. If this is
+ if the history server is accessing HDFS files on a secure Hadoop cluster. If this is
true, it uses the configs <code>spark.history.kerberos.principal</code> and
- <code>spark.history.kerberos.keytab</code>.
+ <code>spark.history.kerberos.keytab</code>.
</td>
</tr>
<tr>
@@ -159,12 +159,12 @@ The history server can be configured as follows:
<td>false</td>
<td>
Specifies whether acls should be checked to authorize users viewing the applications.
- If enabled, access control checks are made regardless of what the individual application had
+ If enabled, access control checks are made regardless of what the individual application had
set for <code>spark.ui.acls.enable</code> when the application was run. The application owner
- will always have authorization to view their own application and any users specified via
+ will always have authorization to view their own application and any users specified via
<code>spark.ui.view.acls</code> when the application was run will also have authorization
- to view that application.
- If disabled, no access control checks are made.
+ to view that application.
+ If disabled, no access control checks are made.
</td>
</tr>
<tr>
@@ -298,14 +298,14 @@ keep the paths consistent in both modes.
# Metrics
-Spark has a configurable metrics system based on the
-[Coda Hale Metrics Library](http://metrics.codahale.com/).
-This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV
-files. The metrics system is configured via a configuration file that Spark expects to be present
-at `$SPARK_HOME/conf/metrics.properties`. A custom file location can be specified via the
+Spark has a configurable metrics system based on the
+[Coda Hale Metrics Library](http://metrics.codahale.com/).
+This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV
+files. The metrics system is configured via a configuration file that Spark expects to be present
+at `$SPARK_HOME/conf/metrics.properties`. A custom file location can be specified via the
`spark.metrics.conf` [configuration property](configuration.html#spark-properties).
-Spark's metrics are decoupled into different
-_instances_ corresponding to Spark components. Within each instance, you can configure a
+Spark's metrics are decoupled into different
+_instances_ corresponding to Spark components. Within each instance, you can configure a
set of sinks to which metrics are reported. The following instances are currently supported:
* `master`: The Spark standalone master process.
@@ -330,26 +330,26 @@ licensing restrictions:
* `GangliaSink`: Sends metrics to a Ganglia node or multicast group.
To install the `GangliaSink` you'll need to perform a custom build of Spark. _**Note that
-by embedding this library you will include [LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed
-code in your Spark package**_. For sbt users, set the
-`SPARK_GANGLIA_LGPL` environment variable before building. For Maven users, enable
+by embedding this library you will include [LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed
+code in your Spark package**_. For sbt users, set the
+`SPARK_GANGLIA_LGPL` environment variable before building. For Maven users, enable
the `-Pspark-ganglia-lgpl` profile. In addition to modifying the cluster's Spark build
user applications will need to link to the `spark-ganglia-lgpl` artifact.
-The syntax of the metrics configuration file is defined in an example configuration file,
+The syntax of the metrics configuration file is defined in an example configuration file,
`$SPARK_HOME/conf/metrics.properties.template`.
# Advanced Instrumentation
Several external tools can be used to help profile the performance of Spark jobs:
-* Cluster-wide monitoring tools, such as [Ganglia](http://ganglia.sourceforge.net/), can provide
-insight into overall cluster utilization and resource bottlenecks. For instance, a Ganglia
-dashboard can quickly reveal whether a particular workload is disk bound, network bound, or
+* Cluster-wide monitoring tools, such as [Ganglia](http://ganglia.sourceforge.net/), can provide
+insight into overall cluster utilization and resource bottlenecks. For instance, a Ganglia
+dashboard can quickly reveal whether a particular workload is disk bound, network bound, or
CPU bound.
-* OS profiling tools such as [dstat](http://dag.wieers.com/home-made/dstat/),
-[iostat](http://linux.die.net/man/1/iostat), and [iotop](http://linux.die.net/man/1/iotop)
+* OS profiling tools such as [dstat](http://dag.wieers.com/home-made/dstat/),
+[iostat](http://linux.die.net/man/1/iostat), and [iotop](http://linux.die.net/man/1/iotop)
can provide fine-grained profiling on individual nodes.
-* JVM utilities such as `jstack` for providing stack traces, `jmap` for creating heap-dumps,
-`jstat` for reporting time-series statistics and `jconsole` for visually exploring various JVM
+* JVM utilities such as `jstack` for providing stack traces, `jmap` for creating heap-dumps,
+`jstat` for reporting time-series statistics and `jconsole` for visually exploring various JVM
properties are useful for those comfortable with JVM internals.