SPARK-3642. Document the nuances of shared variables.

Author: Sandy Ryza <sandy@cloudera.com> Closes #2490 from sryza/sandy-spark-3642 and squashes the following commits: aae3340 [Sandy Ryza] SPARK-3642. Document the nuances of broadcast variables
author: Sandy Ryza <sandy@cloudera.com> 2015-03-11 13:22:05 +0000
committer: Sean Owen <sowen@cloudera.com> 2015-03-11 13:22:05 +0000
commit: 2d87a415f20c85487537d6791a73827ff537f2c0 (patch)
tree: 6870246a6e3296a23e96f4c9effcb1951a650aa5
parent: 548643a9e4690b69e2a496cdcd0a426b6de8d8b5 (diff)
download: spark-2d87a415f20c85487537d6791a73827ff537f2c0.tar.gz
spark-2d87a415f20c85487537d6791a73827ff537f2c0.tar.bz2
spark-2d87a415f20c85487537d6791a73827ff537f2c0.zip
1 files changed, 6 insertions, 0 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index c011a8404f..eda3a95426 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1207,6 +1207,12 @@ than shipping a copy of it with tasks. They can be used, for example, to give ev
 large input dataset in an efficient manner. Spark also attempts to distribute broadcast variables
 using efficient broadcast algorithms to reduce communication cost.
 
+Spark actions are executed through a set of stages, separated by distributed "shuffle" operations.
+Spark automatically broadcasts the common data needed by tasks within each stage. The data
+broadcasted this way is cached in serialized form and deserialized before running each task. This
+means that explicitly creating broadcast variables is only useful when tasks across multiple stages
+need the same data or when caching the data in deserialized form is important.
+
 Broadcast variables are created from a variable `v` by calling `SparkContext.broadcast(v)`. The
 broadcast variable is a wrapper around `v`, and its value can be accessed by calling the `value`
 method. The code below shows this:
author	Sandy Ryza <sandy@cloudera.com>	2015-03-11 13:22:05 +0000
committer	Sean Owen <sowen@cloudera.com>	2015-03-11 13:22:05 +0000
commit	2d87a415f20c85487537d6791a73827ff537f2c0 (patch)
tree	6870246a6e3296a23e96f4c9effcb1951a650aa5
parent	548643a9e4690b69e2a496cdcd0a426b6de8d8b5 (diff)
download	spark-2d87a415f20c85487537d6791a73827ff537f2c0.tar.gz spark-2d87a415f20c85487537d6791a73827ff537f2c0.tar.bz2 spark-2d87a415f20c85487537d6791a73827ff537f2c0.zip