aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTathagata Das <tathagata.das1565@gmail.com>2016-06-30 14:01:34 -0700
committerTathagata Das <tathagata.das1565@gmail.com>2016-06-30 14:01:34 -0700
commit5d00a7bc19ddeb1b5247733b55095a03ee7b1a30 (patch)
treea573ac5c1596b45788defeeb33615e4f4a6e20dd
parentc62263340edb6976a10f274e716fde6cd2c5bf34 (diff)
downloadspark-5d00a7bc19ddeb1b5247733b55095a03ee7b1a30.tar.gz
spark-5d00a7bc19ddeb1b5247733b55095a03ee7b1a30.tar.bz2
spark-5d00a7bc19ddeb1b5247733b55095a03ee7b1a30.zip
[SPARK-16256][DOCS] Fix window operation diagram
Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #14001 from tdas/SPARK-16256-2.
-rw-r--r--docs/img/structured-streaming-late-data.pngbin138931 -> 138226 bytes
-rw-r--r--docs/img/structured-streaming-window.pngbin128930 -> 132875 bytes
-rw-r--r--docs/img/structured-streaming.pptxbin1105315 -> 1105413 bytes
-rw-r--r--docs/structured-streaming-programming-guide.md2
4 files changed, 1 insertions, 1 deletions
diff --git a/docs/img/structured-streaming-late-data.png b/docs/img/structured-streaming-late-data.png
index 5276b47868..2283f6782f 100644
--- a/docs/img/structured-streaming-late-data.png
+++ b/docs/img/structured-streaming-late-data.png
Binary files differ
diff --git a/docs/img/structured-streaming-window.png b/docs/img/structured-streaming-window.png
index be9d3fbf8b..c1842b1ca4 100644
--- a/docs/img/structured-streaming-window.png
+++ b/docs/img/structured-streaming-window.png
Binary files differ
diff --git a/docs/img/structured-streaming.pptx b/docs/img/structured-streaming.pptx
index c278323554..6aad2ed33e 100644
--- a/docs/img/structured-streaming.pptx
+++ b/docs/img/structured-streaming.pptx
Binary files differ
diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md
index 593256603f..79493968db 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -620,7 +620,7 @@ df.groupBy("type").count()
### Window Operations on Event Time
Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration.
-Imagine the quick example is modified and the stream contains lines along with the time when the line was generated. Instead of running word counts, we want to count words within 10 minute windows, updating every 5 minutes. That is, word counts in words received between 10 minute windows 12:00 - 12:10, 12:05 - 12:15, 12:10 - 12:20, etc. Note that 12:00 - 12:10 means data that arrived after 12:00 but before 12:10. Now, consider a word that was received at 12:07. This word should increment the counts corresponding to two windows 12:00 - 12:10 and 12:05 - 12:15. So the counts will be indexed by both, the grouping key (i.e. the word) and the window (can be calculated from the event-time).
+Imagine our quick example is modified and the stream now contains lines along with the time when the line was generated. Instead of running word counts, we want to count words within 10 minute windows, updating every 5 minutes. That is, word counts in words received between 10 minute windows 12:00 - 12:10, 12:05 - 12:15, 12:10 - 12:20, etc. Note that 12:00 - 12:10 means data that arrived after 12:00 but before 12:10. Now, consider a word that was received at 12:07. This word should increment the counts corresponding to two windows 12:00 - 12:10 and 12:05 - 12:15. So the counts will be indexed by both, the grouping key (i.e. the word) and the window (can be calculated from the event-time).
The result tables would look something like the following.