aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2014-10-25 00:06:57 -0700
committerJosh Rosen <joshrosen@databricks.com>2014-10-25 00:06:57 -0700
commit9530316887612dca060a128fca34dd5a6ab2a9a9 (patch)
treec88a0a75186a5f72676f28e131368f62de8f99b6 /docs/configuration.md
parent3a845d3c048eebb0bddb3937128746fde3e8e4d8 (diff)
downloadspark-9530316887612dca060a128fca34dd5a6ab2a9a9.tar.gz
spark-9530316887612dca060a128fca34dd5a6ab2a9a9.tar.bz2
spark-9530316887612dca060a128fca34dd5a6ab2a9a9.zip
[SPARK-2321] Stable pull-based progress / status API
This pull request is a first step towards the implementation of a stable, pull-based progress / status API for Spark (see [SPARK-2321](https://issues.apache.org/jira/browse/SPARK-2321)). For now, I'd like to discuss the basic implementation, API names, and overall interface design. Once we arrive at a good design, I'll go back and add additional methods to expose more information via these API. #### Design goals: - Pull-based API - Usable from Java / Scala / Python (eventually, likely with a wrapper) - Can be extended to expose more information without introducing binary incompatibilities. - Returns immutable objects. - Don't leak any implementation details, preserving our freedom to change the implementation. #### Implementation: - Add public methods (`getJobInfo`, `getStageInfo`) to SparkContext to allow status / progress information to be retrieved. - Add public interfaces (`SparkJobInfo`, `SparkStageInfo`) for our API return values. These interfaces consist entirely of Java-style getter methods. The interfaces are currently implemented in Java. I decided to explicitly separate the interface from its implementation (`SparkJobInfoImpl`, `SparkStageInfoImpl`) in order to prevent users from constructing these responses themselves. -Allow an existing JobProgressListener to be used when constructing a live SparkUI. This allows us to re-use this listeners in the implementation of this status API. There are a few reasons why this listener re-use makes sense: - The status API and web UI are guaranteed to show consistent information. - These listeners are already well-tested. - The same garbage-collection / information retention configurations can apply to both this API and the web UI. - Extend JobProgressListener to maintain `jobId -> Job` and `stageId -> Stage` mappings. The progress API methods are implemented in a separate trait that's mixed into SparkContext. This helps to avoid SparkContext.scala from becoming larger and more difficult to read. Author: Josh Rosen <joshrosen@databricks.com> Author: Josh Rosen <joshrosen@apache.org> Closes #2696 from JoshRosen/progress-reporting-api and squashes the following commits: e6aa78d [Josh Rosen] Add tests. b585c16 [Josh Rosen] Accept SparkListenerBus instead of more specific subclasses. c96402d [Josh Rosen] Address review comments. 2707f98 [Josh Rosen] Expose current stage attempt id c28ba76 [Josh Rosen] Update demo code: 646ff1d [Josh Rosen] Document spark.ui.retainedJobs. 7f47d6d [Josh Rosen] Clean up SparkUI constructors, per Andrew's feedback. b77b3d8 [Josh Rosen] Merge remote-tracking branch 'origin/master' into progress-reporting-api 787444c [Josh Rosen] Move status API methods into trait that can be mixed into SparkContext. f9a9a00 [Josh Rosen] More review comments: 3dc79af [Josh Rosen] Remove creation of unused listeners in SparkContext. 249ca16 [Josh Rosen] Address several review comments: da5648e [Josh Rosen] Add example of basic progress reporting in Java. 7319ffd [Josh Rosen] Add getJobIdsForGroup() and num*Tasks() methods. cc568e5 [Josh Rosen] Add note explaining that interfaces should not be implemented outside of Spark. 6e840d4 [Josh Rosen] Remove getter-style names and "consistent snapshot" semantics: 08cbec9 [Josh Rosen] Begin to sketch the interfaces for a stable, public status API. ac2d13a [Josh Rosen] Add jobId->stage, stageId->stage mappings in JobProgressListener 24de263 [Josh Rosen] Create UI listeners in SparkContext instead of in Tabs:
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md11
1 files changed, 10 insertions, 1 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 66738d3ca7..3007706a25 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -375,7 +375,16 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.ui.retainedStages</code></td>
<td>1000</td>
<td>
- How many stages the Spark UI remembers before garbage collecting.
+ How many stages the Spark UI and status APIs remember before garbage
+ collecting.
+ </td>
+</tr>
+<tr>
+ <td><code>spark.ui.retainedJobs</code></td>
+ <td>1000</td>
+ <td>
+ How many stages the Spark UI and status APIs remember before garbage
+ collecting.
</td>
</tr>
<tr>