aboutsummaryrefslogtreecommitdiff
path: root/docs/monitoring.md
diff options
context:
space:
mode:
authorImran Rashid <irashid@cloudera.com>2015-05-05 07:25:40 -0500
committerImran Rashid <irashid@cloudera.com>2015-05-05 07:25:40 -0500
commitd49735800db27239c11478aac4b0f2ec9df91a3f (patch)
treeb70111993f4c8fb8913987b5b1d7dae080d26190 /docs/monitoring.md
parent51f462003b416eac92feb5a6725f6c2994389010 (diff)
downloadspark-d49735800db27239c11478aac4b0f2ec9df91a3f.tar.gz
spark-d49735800db27239c11478aac4b0f2ec9df91a3f.tar.bz2
spark-d49735800db27239c11478aac4b0f2ec9df91a3f.zip
[SPARK-3454] separate json endpoints for data in the UI
Exposes data available in the UI as json over http. Key points: * new endpoints, handled independently of existing XyzPage classes. Root entrypoint is `JsonRootResource` * Uses jersey + jackson for routing & converting POJOs into json * tests against known results in `HistoryServerSuite` * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages. Author: Imran Rashid <irashid@cloudera.com> Closes #4435 from squito/SPARK-3454 and squashes the following commits: da1e35f [Imran Rashid] typos etc. 5e78b4f [Imran Rashid] fix rendering problems 5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454 f016182 [Imran Rashid] change all constructors json-pojo class constructors to be private[spark] to protect us from mima-false-positives if we add fields 3347b72 [Imran Rashid] mark EnumUtil as @Private ec140a2 [Imran Rashid] create @Private cc1febf [Imran Rashid] add docs on the metrics-as-json api cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454 56db31e [Imran Rashid] update tests for mulit-attempt 7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt" 67008b4 [Imran Rashid] rats 9e51400 [Imran Rashid] style c9bae1c [Imran Rashid] handle multiple attempts per app b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt 188762c [Imran Rashid] multi-attempt 2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454 befff0c [Imran Rashid] review feedback 14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version & scope to parent pom.xml f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454 dc8a7fe [Imran Rashid] style, fix errant comments acb7ef6 [Imran Rashid] fix indentation 7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed; comments 9d889d6 [Imran Rashid] undo some unnecessary changes f48a7b0 [Imran Rashid] docs 52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to synchronize internally to be thread-safe 31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES b2f8b91 [Imran Rashid] @DeveloperApi 2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory overhead ba3d9d2 [Imran Rashid] upper case enums 39ac29c [Imran Rashid] move EnumUtil d2bde77 [Imran Rashid] update error handling & scoping 4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential version conflicts a157a2f [Imran Rashid] style 7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything a325563 [Imran Rashid] style a9c5cf1 [Imran Rashid] undo changes superceeded by master 0c6f968 [Imran Rashid] update deps 1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454 4c92af6 [Imran Rashid] style f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454 c22b11f [Imran Rashid] fix compile error 9ea682c [Imran Rashid] go back to good ol' java enums cf86175 [Imran Rashid] style d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454 f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :( 101a698 [Imran Rashid] style d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the application listing more often b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt" e031719 [Imran Rashid] fixes from review 1f53a66 [Imran Rashid] style b4a7863 [Imran Rashid] fix compile error 2c8b7ee [Imran Rashid] rats 1578a4a [Imran Rashid] doc 674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages vs. number retained 9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since there is no substitute yet db61211 [Imran Rashid] get JobProgressListener directly from UI fdfc181 [Imran Rashid] stage/taskList 63eb4a6 [Imran Rashid] tests for taskSummary ad27de8 [Imran Rashid] error handling on quantile values b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one resource aaba896 [Imran Rashid] wire up task summary a4b1397 [Imran Rashid] stage metric distributions e48ba32 [Imran Rashid] rename eaf3bbb [Imran Rashid] style 25cd894 [Imran Rashid] if only given day, assume GMT 51eaedb [Imran Rashid] more visibility fixes 9f28b7e [Imran Rashid] ack, more cleanup 99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454 a61a43c [Imran Rashid] oops, remove accidental checkin a066055 [Imran Rashid] set visibility on a lot of classes 1f361c8 [Imran Rashid] update rat-excludes 0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 2382bef [Imran Rashid] switch to using new "enum" fef6605 [Imran Rashid] some utils for working w/ new "enum" format dbfc7bf [Imran Rashid] style b86bcb0 [Imran Rashid] update test to look at one stage attempt 5f9df24 [Imran Rashid] style 7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication 73f1378 [Imran Rashid] test json; also add test cases for cleaned stages & jobs 97d411f [Imran Rashid] json endpoint for one job 0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for all attempts for one stage 190c17a [Imran Rashid] StagePage should distinguish no task data, from unknown stage 84cd497 [Imran Rashid] AllJobsPage should still report correct completed & failed job count, even if some have been cleaned, to make it consistent w/ AllStagesPage 36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can list its own applicationInfo b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a few cases e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work f7df095 [Imran Rashid] add test for accumulables, and discover that I need update after all 9c0c125 [Imran Rashid] add accumulableInfo 00e9cc5 [Imran Rashid] more style 3377e61 [Imran Rashid] scaladoc d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since they have binary compatibility issues 654cecf [Imran Rashid] move all the status api POJOs to one file b86e2b0 [Imran Rashid] style 18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 5598f19 [Imran Rashid] delete some unnecessary code, more to go 56edce0 [Imran Rashid] style 017c755 [Imran Rashid] add in metrics now available 1b78cb7 [Imran Rashid] fix some import ordering 0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving up c7d884f [Imran Rashid] fix merge conflicts 0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey b6a96a8 [Imran Rashid] compare json by AST, not string cd37845 [Imran Rashid] switch to using java.util.Dates for times a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven wasn't happy before this 4fdc39f [Imran Rashid] refactor case insensitive enum parsing cba1ef6 [Imran Rashid] add security (maybe?) for metrics json f0264a7 [Imran Rashid] switch to using jersey for metrics json bceb3a9 [Imran Rashid] set http response code on error, some testing e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is this OK?) b252e7a [Imran Rashid] small cleanup of accidental changes d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt 4b398d0 [Imran Rashid] expose UI data as json in new endpoints
Diffstat (limited to 'docs/monitoring.md')
-rw-r--r--docs/monitoring.md74
1 files changed, 74 insertions, 0 deletions
diff --git a/docs/monitoring.md b/docs/monitoring.md
index 8a85928d6d..1e0fc15086 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -174,6 +174,80 @@ making it easy to identify slow tasks, data skew, etc.
Note that the history server only displays completed Spark jobs. One way to signal the completion of a Spark job is to stop the Spark Context explicitly (`sc.stop()`), or in Python using the `with SparkContext() as sc:` to handle the Spark Context setup and tear down, and still show the job history on the UI.
+## REST API
+
+In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers
+an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for
+both running applications, and in the history server. The endpoints are mounted at `/json/v1`. Eg.,
+for the history server, they would typically be accessible at `http://<server-url>:18080/json/v1`, and
+for a running application, at `http://localhost:4040/json/v1`.
+
+<table class="table">
+ <tr><th>Endpoint</th><th>Meaning</th></tr>
+ <tr>
+ <td><code>/applications</code></td>
+ <td>A list of all applications</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/jobs</code></td>
+ <td>A list of all jobs for a given application</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/jobs/[job-id]</code></td>
+ <td>Details for the given job</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/stages</code></td>
+ <td>A list of all stages for a given application</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/stages/[stage-id]</code></td>
+ <td>A list of all attempts for the given stage</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]</code></td>
+ <td>Details for the given stage attempt</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskSummary</code></td>
+ <td>Summary metrics of all tasks in the given stage attempt</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskList</code></td>
+ <td>A list of all tasks for the given stage attempt</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/executors</code></td>
+ <td>A list of all executors for the given application</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/storage/rdd</code></td>
+ <td>A list of stored RDDs for the given application</td>
+ </tr>
+ <tr>
+ <td><code>/applications/[app-id]/storage/rdd/[rdd-id]</code></td>
+ <td>Details for the storage status of a given RDD</td>
+ </tr>
+</table>
+
+When running on Yarn, each application has multiple attempts, so `[app-id]` is actually
+`[app-id]/[attempt-id]` in all cases.
+
+These endpoints have been strongly versioned to make it easier to develop applications on top.
+ In particular, Spark guarantees:
+
+* Endpoints will never be removed from one version
+* Individual fields will never be removed for any given endpoint
+* New endpoints may be added
+* New fields may be added to existing endpoints
+* New versions of the api may be added in the future at a separate endpoint (eg., `json/v2`). New versions are *not* required to be backwards compatible.
+* Api versions may be dropped, but only after at least one minor release of co-existing with a new api version
+
+Note that even when examining the UI of a running applications, the `applications/[app-id]` portion is
+still required, though there is only one application available. Eg. to see the list of jobs for the
+running app, you would go to `http://localhost:4040/json/v1/applications/[app-id]/jobs`. This is to
+keep the paths consistent in both modes.
+
# Metrics
Spark has a configurable metrics system based on the