From af8ffdb73c28012c9f5cf232ca7d4b4c6763628d Mon Sep 17 00:00:00 2001 From: Matei Zaharia Date: Sun, 8 Sep 2013 13:36:50 -0700 Subject: Review comments --- docs/cluster-overview.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++ docs/job-scheduling.md | 2 +- 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md index 143f93171f..cf6b48c05e 100644 --- a/docs/cluster-overview.md +++ b/docs/cluster-overview.md @@ -68,3 +68,50 @@ access this UI. The [monitoring guide](monitoring.html) also describes other mon Spark gives control over resource allocation both _across_ applications (at the level of the cluster manager) and _within_ applications (if multiple computations are happening on the same SparkContext). The [job scheduling overview](job-scheduling.html) describes this in more detail. + +# Glossary + +The following table summarizes terms you'll see used to refer to cluster concepts: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TermMeaning
ApplicationAny user program invoking Spark
Driver programThe process running the main() function of the application and creating the SparkContext
Cluster managerAn external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Worker nodeAny node that can run application code in the cluster
ExecutorA process launched for an application on a worker node, that runs tasks and keeps data in memory + or disk storage across them. Each application has its own executors.
TaskA unit of work that will be sent to one executor
JobA parallel computation consisting of multiple tasks that gets spawned in response to a Spark action + (e.g. save, collect); you'll see this term used in the driver's logs.
StageEach job gets divided into smaller sets of tasks called stages that depend on each other + (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md index 11b733137d..d304c5497b 100644 --- a/docs/job-scheduling.md +++ b/docs/job-scheduling.md @@ -25,7 +25,7 @@ different options to manage allocation, depending on the cluster manager. The simplest option, available on all cluster managers, is _static partitioning_ of resources. With this approach, each application is given a maximum amount of resources it can use, and holds onto them -for its whole duration. This is the only approach available in Spark's [standalone](spark-standalone.html) +for its whole duration. This is the approach used in Spark's [standalone](spark-standalone.html) and [YARN](running-on-yarn.html) modes, as well as the [coarse-grained Mesos mode](running-on-mesos.html#mesos-run-modes). Resource allocation can be configured as follows, based on the cluster type: -- cgit v1.2.3