aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Parente <pparent@us.ibm.com>2015-04-09 06:37:20 -0400
committerSean Owen <sowen@cloudera.com>2015-04-09 06:38:04 -0400
commitec3e76f1e47bfe1bdd3a010c004124d3bb17c3d1 (patch)
tree860d91b437b940a6dac6b988a705c6c984384e05
parent4453c591a2eaa9381766f6155bfd3e7749f721e0 (diff)
downloadspark-ec3e76f1e47bfe1bdd3a010c004124d3bb17c3d1.tar.gz
spark-ec3e76f1e47bfe1bdd3a010c004124d3bb17c3d1.tar.bz2
spark-ec3e76f1e47bfe1bdd3a010c004124d3bb17c3d1.zip
[SPARK-6343] Doc driver-worker network reqs
Attempt at making the driver-worker networking requirement more explicit and up-front in the documentation (see https://issues.apache.org/jira/browse/SPARK-6343). Update cluster overview diagram to show connections from workers to driver. Add a bullet below about how driver listens / accepts connections from workers. Author: Peter Parente <pparent@us.ibm.com> Closes #5382 from parente/SPARK-6343 and squashes the following commits: 0b2fb9d [Peter Parente] [SPARK-6343] Doc driver-worker network reqs (cherry picked from commit b9c51c04932efeeda790752276078314db440634) Signed-off-by: Sean Owen <sowen@cloudera.com>
-rw-r--r--docs/cluster-overview.md6
-rw-r--r--docs/img/cluster-overview.pngbin28011 -> 33565 bytes
-rw-r--r--docs/img/cluster-overview.pptxbin51771 -> 28133 bytes
3 files changed, 5 insertions, 1 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index 6a75d5c457..7079de546e 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -33,7 +33,11 @@ There are several useful things to note about this architecture:
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor
processes, and these communicate with each other, it is relatively easy to run it even on a
cluster manager that also supports other applications (e.g. Mesos/YARN).
-3. Because the driver schedules tasks on the cluster, it should be run close to the worker
+3. The driver program must listen for and accept incoming connections from its executors throughout
+ its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
+ section](configuration.html#networking)). As such, the driver program must be network
+ addressable from the worker nodes.
+4. Because the driver schedules tasks on the cluster, it should be run close to the worker
nodes, preferably on the same local area network. If you'd like to send requests to the
cluster remotely, it's better to open an RPC to the driver and have it submit operations
from nearby than to run a driver far away from the worker nodes.
diff --git a/docs/img/cluster-overview.png b/docs/img/cluster-overview.png
index 368274068e..317554c5f2 100644
--- a/docs/img/cluster-overview.png
+++ b/docs/img/cluster-overview.png
Binary files differ
diff --git a/docs/img/cluster-overview.pptx b/docs/img/cluster-overview.pptx
index af3c462cd9..1b90d7ec5a 100644
--- a/docs/img/cluster-overview.pptx
+++ b/docs/img/cluster-overview.pptx
Binary files differ