aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMichael Vogiatzis <michaelvogiatzis@gmail.com>2015-07-09 19:53:23 -0700
committerTathagata Das <tathagata.das1565@gmail.com>2015-07-09 19:54:21 -0700
commitd538919cc4fd3ab940d478c62dce1bae0270cfeb (patch)
tree7e564d8a65d29ef0cacf3fcf362c5cc79c19096b /docs
parent1903641e68ce7e7e657584bf45e91db6df357e41 (diff)
downloadspark-d538919cc4fd3ab940d478c62dce1bae0270cfeb.tar.gz
spark-d538919cc4fd3ab940d478c62dce1bae0270cfeb.tar.bz2
spark-d538919cc4fd3ab940d478c62dce1bae0270cfeb.zip
[DOCS] Added important updateStateByKey details
Runs for *all* existing keys and returning "None" will remove the key-value pair. Author: Michael Vogiatzis <michaelvogiatzis@gmail.com> Closes #7229 from mvogiatzis/patch-1 and squashes the following commits: e7a2946 [Michael Vogiatzis] Updated updateStateByKey text 00283ed [Michael Vogiatzis] Removed space c2656f9 [Michael Vogiatzis] Moved description farther up 0a42551 [Michael Vogiatzis] Added important updateStateByKey details
Diffstat (limited to 'docs')
-rw-r--r--docs/streaming-programming-guide.md2
1 files changed, 2 insertions, 0 deletions
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index e72d5580da..2f3013b533 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -854,6 +854,8 @@ it with new information. To use this, you will have to do two steps.
1. Define the state update function - Specify with a function how to update the state using the
previous state and the new values from an input stream.
+In every batch, Spark will apply the state update function for all existing keys, regardless of whether they have new data in a batch or not. If the update function returns `None` then the key-value pair will be eliminated.
+
Let's illustrate this with an example. Say you want to maintain a running count of each word
seen in a text data stream. Here, the running count is the state and it is an integer. We
define the update function as: