aboutsummaryrefslogtreecommitdiff
path: root/docs/monitoring.md
diff options
context:
space:
mode:
authorSteve Loughran <stevel@hortonworks.com>2016-02-11 21:37:53 -0600
committerImran Rashid <irashid@cloudera.com>2016-02-11 21:37:53 -0600
commita2c7dcf61f33fa1897c950d2d905651103c170ea (patch)
tree90268ba2e3c02be159411ed15d31408cd99e505a /docs/monitoring.md
parentd3e2e202994e063856c192e9fdd0541777b88e0e (diff)
downloadspark-a2c7dcf61f33fa1897c950d2d905651103c170ea.tar.gz
spark-a2c7dcf61f33fa1897c950d2d905651103c170ea.tar.bz2
spark-a2c7dcf61f33fa1897c950d2d905651103c170ea.zip
[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps
When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger *filesize*. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.
Diffstat (limited to 'docs/monitoring.md')
-rw-r--r--docs/monitoring.md70
1 files changed, 55 insertions, 15 deletions
diff --git a/docs/monitoring.md b/docs/monitoring.md
index cedceb2958..c37f6fb20d 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -38,11 +38,25 @@ You can start the history server by executing:
./sbin/start-history-server.sh
-When using the file-system provider class (see spark.history.provider below), the base logging
-directory must be supplied in the <code>spark.history.fs.logDirectory</code> configuration option,
-and should contain sub-directories that each represents an application's event logs. This creates a
-web interface at `http://<server-url>:18080` by default. The history server can be configured as
-follows:
+This creates a web interface at `http://<server-url>:18080` by default, listing incomplete
+and completed applications and attempts, and allowing them to be viewed
+
+When using the file-system provider class (see `spark.history.provider` below), the base logging
+directory must be supplied in the `spark.history.fs.logDirectory` configuration option,
+and should contain sub-directories that each represents an application's event logs.
+
+The spark jobs themselves must be configured to log events, and to log them to the same shared,
+writeable directory. For example, if the server was configured with a log directory of
+`hdfs://namenode/shared/spark-logs`, then the client-side options would be:
+
+```
+spark.eventLog.enabled true
+spark.eventLog.dir hdfs://namenode/shared/spark-logs
+```
+
+The history server can be configured as follows:
+
+### Environment Variables
<table class="table">
<tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr>
@@ -69,11 +83,13 @@ follows:
</tr>
</table>
+### Spark configuration options
+
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
<td>spark.history.provider</td>
- <td>org.apache.spark.deploy.history.FsHistoryProvider</td>
+ <td><code>org.apache.spark.deploy.history.FsHistoryProvider</code></td>
<td>Name of the class implementing the application history backend. Currently there is only
one implementation, provided by Spark, which looks for application logs stored in the
file system.</td>
@@ -82,15 +98,21 @@ follows:
<td>spark.history.fs.logDirectory</td>
<td>file:/tmp/spark-events</td>
<td>
- Directory that contains application event logs to be loaded by the history server
+ For the filesystem history provider, the URL to the directory containing application event
+ logs to load. This can be a local <code>file://</code> path,
+ an HDFS path <code>hdfs://namenode/shared/spark-logs</code>
+ or that of an alternative filesystem supported by the Hadoop APIs.
</td>
</tr>
<tr>
<td>spark.history.fs.update.interval</td>
<td>10s</td>
<td>
- The period at which information displayed by this history server is updated.
- Each update checks for any changes made to the event logs in persisted storage.
+ The period at which the the filesystem history provider checks for new or
+ updated logs in the log directory. A shorter interval detects new applications faster,
+ at the expense of more server load re-reading updated applications.
+ As soon as an update has completed, listings of the completed and incomplete applications
+ will reflect the changes.
</td>
</tr>
<tr>
@@ -112,7 +134,7 @@ follows:
<td>spark.history.kerberos.enabled</td>
<td>false</td>
<td>
- Indicates whether the history server should use kerberos to login. This is useful
+ Indicates whether the history server should use kerberos to login. This is required
if the history server is accessing HDFS files on a secure Hadoop cluster. If this is
true, it uses the configs <code>spark.history.kerberos.principal</code> and
<code>spark.history.kerberos.keytab</code>.
@@ -156,15 +178,15 @@ follows:
<td>spark.history.fs.cleaner.interval</td>
<td>1d</td>
<td>
- How often the job history cleaner checks for files to delete.
- Files are only deleted if they are older than spark.history.fs.cleaner.maxAge.
+ How often the filesystem job history cleaner checks for files to delete.
+ Files are only deleted if they are older than <code>spark.history.fs.cleaner.maxAge</code>
</td>
</tr>
<tr>
<td>spark.history.fs.cleaner.maxAge</td>
<td>7d</td>
<td>
- Job history files older than this will be deleted when the history cleaner runs.
+ Job history files older than this will be deleted when the filesystem history cleaner runs.
</td>
</tr>
</table>
@@ -172,7 +194,25 @@ follows:
Note that in all of these UIs, the tables are sortable by clicking their headers,
making it easy to identify slow tasks, data skew, etc.
-Note that the history server only displays completed Spark jobs. One way to signal the completion of a Spark job is to stop the Spark Context explicitly (`sc.stop()`), or in Python using the `with SparkContext() as sc:` to handle the Spark Context setup and tear down, and still show the job history on the UI.
+Note
+
+1. The history server displays both completed and incomplete Spark jobs. If an application makes
+multiple attempts after failures, the failed attempts will be displayed, as well as any ongoing
+incomplete attempt or the final successful attempt.
+
+2. Incomplete applications are only updated intermittently. The time between updates is defined
+by the interval between checks for changed files (`spark.history.fs.update.interval`).
+On larger clusters the update interval may be set to large values.
+The way to view a running application is actually to view its own web UI.
+
+3. Applications which exited without registering themselves as completed will be listed
+as incomplete —even though they are no longer running. This can happen if an application
+crashes.
+
+2. One way to signal the completion of a Spark job is to stop the Spark Context
+explicitly (`sc.stop()`), or in Python using the `with SparkContext() as sc:` construct
+to handle the Spark Context setup and tear down.
+
## REST API
@@ -249,7 +289,7 @@ These endpoints have been strongly versioned to make it easier to develop applic
* New endpoints may be added
* New fields may be added to existing endpoints
* New versions of the api may be added in the future at a separate endpoint (eg., `api/v2`). New versions are *not* required to be backwards compatible.
-* Api versions may be dropped, but only after at least one minor release of co-existing with a new api version
+* Api versions may be dropped, but only after at least one minor release of co-existing with a new api version.
Note that even when examining the UI of a running applications, the `applications/[app-id]` portion is
still required, though there is only one application available. Eg. to see the list of jobs for the