aboutsummaryrefslogtreecommitdiff
path: root/docs/cluster-overview.md
diff options
context:
space:
mode:
authorJacek Laskowski <jacek@japila.pl>2015-09-08 14:38:10 +0100
committerSean Owen <sowen@cloudera.com>2015-09-08 14:38:10 +0100
commit6ceed852ab716d8acc46ce90cba9cfcff6d3616f (patch)
treed893483e0fbb3601d4bde3aaf30a849b641ac24f /docs/cluster-overview.md
parent9d8e838d883ed21f9ef562e7e3ac074c7e4adb88 (diff)
downloadspark-6ceed852ab716d8acc46ce90cba9cfcff6d3616f.tar.gz
spark-6ceed852ab716d8acc46ce90cba9cfcff6d3616f.tar.bz2
spark-6ceed852ab716d8acc46ce90cba9cfcff6d3616f.zip
Docs small fixes
Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.
Diffstat (limited to 'docs/cluster-overview.md')
-rw-r--r--docs/cluster-overview.md15
1 files changed, 8 insertions, 7 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index 7079de546e..faaf154d24 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -5,18 +5,19 @@ title: Cluster Mode Overview
This document gives a short overview of how Spark runs on clusters, to make it easier to understand
the components involved. Read through the [application submission guide](submitting-applications.html)
-to submit applications to a cluster.
+to learn about launching applications on a cluster.
# Components
-Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext
+Spark applications run as independent sets of processes on a cluster, coordinated by the `SparkContext`
object in your main program (called the _driver program_).
+
Specifically, to run on a cluster, the SparkContext can connect to several types of _cluster managers_
-(either Spark's own standalone cluster manager or Mesos/YARN), which allocate resources across
+(either Spark's own standalone cluster manager, Mesos or YARN), which allocate resources across
applications. Once connected, Spark acquires *executors* on nodes in the cluster, which are
processes that run computations and store data for your application.
Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to
-the executors. Finally, SparkContext sends *tasks* for the executors to run.
+the executors. Finally, SparkContext sends *tasks* to the executors to run.
<p style="text-align: center;">
<img src="img/cluster-overview.png" title="Spark cluster components" alt="Spark cluster components" />
@@ -33,9 +34,9 @@ There are several useful things to note about this architecture:
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor
processes, and these communicate with each other, it is relatively easy to run it even on a
cluster manager that also supports other applications (e.g. Mesos/YARN).
-3. The driver program must listen for and accept incoming connections from its executors throughout
- its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
- section](configuration.html#networking)). As such, the driver program must be network
+3. The driver program must listen for and accept incoming connections from its executors throughout
+ its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
+ section](configuration.html#networking)). As such, the driver program must be network
addressable from the worker nodes.
4. Because the driver schedules tasks on the cluster, it should be run close to the worker
nodes, preferably on the same local area network. If you'd like to send requests to the