Review comments

author: Matei Zaharia <matei@eecs.berkeley.edu> 2013-09-08 13:36:50 -0700
committer: Matei Zaharia <matei@eecs.berkeley.edu> 2013-09-08 13:36:50 -0700
commit: af8ffdb73c28012c9f5cf232ca7d4b4c6763628d (patch)
tree: 78f704de2adaf12c823ad743b4c3bc1303b0d034
parent: c0d375107f414822d65eaff0e3a76dd3fe9e1570 (diff)
download: spark-af8ffdb73c28012c9f5cf232ca7d4b4c6763628d.tar.gz
spark-af8ffdb73c28012c9f5cf232ca7d4b4c6763628d.tar.bz2
spark-af8ffdb73c28012c9f5cf232ca7d4b4c6763628d.zip
2 files changed, 48 insertions, 1 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index 143f93171f..cf6b48c05e 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -68,3 +68,50 @@ access this UI. The [monitoring guide](monitoring.html) also describes other mon
 Spark gives control over resource allocation both _across_ applications (at the level of the cluster
 manager) and _within_ applications (if multiple computations are happening on the same SparkContext).
 The [job scheduling overview](job-scheduling.html) describes this in more detail.
+
+# Glossary
+
+The following table summarizes terms you'll see used to refer to cluster concepts:
+
+<table class="table">
+  <thead>
+    <tr><th style="width: 130px;">Term</th><th>Meaning</th></tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>Application</td>
+      <td>Any user program invoking Spark</td>
+    </tr>
+    <tr>
+      <td>Driver program</td>
+      <td>The process running the main() function of the application and creating the SparkContext</td>
+    </tr>
+    <tr>
+      <td>Cluster manager</td>
+      <td>An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)</td>
+    </tr>
+    <tr>
+      <td>Worker node</td>
+      <td>Any node that can run application code in the cluster</td>
+    </tr>
+    <tr>
+      <td>Executor</td>
+      <td>A process launched for an application on a worker node, that runs tasks and keeps data in memory
+        or disk storage across them. Each application has its own executors.</td>
+    </tr>
+    <tr>
+      <td>Task</td>
+      <td>A unit of work that will be sent to one executor</td>
+    </tr>
+    <tr>
+      <td>Job</td>
+      <td>A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action
+        (e.g. <code>save</code>, <code>collect</code>); you'll see this term used in the driver's logs.</td>
+    </tr>
+    <tr>
+      <td>Stage</td>
+      <td>Each job gets divided into smaller sets of tasks called <em>stages</em> that depend on each other
+        (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.</td>
+    </tr>
+  </tbody>
+</table>
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 11b733137d..d304c5497b 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -25,7 +25,7 @@ different options to manage allocation, depending on the cluster manager.
 
 The simplest option, available on all cluster managers, is _static partitioning_ of resources. With
 this approach, each application is given a maximum amount of resources it can use, and holds onto them
-for its whole duration. This is the only approach available in Spark's [standalone](spark-standalone.html)
+for its whole duration. This is the approach used in Spark's [standalone](spark-standalone.html)
 and [YARN](running-on-yarn.html) modes, as well as the
 [coarse-grained Mesos mode](running-on-mesos.html#mesos-run-modes).
 Resource allocation can be configured as follows, based on the cluster type:
author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-09-08 13:36:50 -0700
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-09-08 13:36:50 -0700
commit	af8ffdb73c28012c9f5cf232ca7d4b4c6763628d (patch)
tree	78f704de2adaf12c823ad743b4c3bc1303b0d034
parent	c0d375107f414822d65eaff0e3a76dd3fe9e1570 (diff)
download	spark-af8ffdb73c28012c9f5cf232ca7d4b4c6763628d.tar.gz spark-af8ffdb73c28012c9f5cf232ca7d4b4c6763628d.tar.bz2 spark-af8ffdb73c28012c9f5cf232ca7d4b4c6763628d.zip