diff options
author | François Garillot <francois@garillot.net> | 2016-05-03 11:42:47 -0700 |
---|---|---|
committer | Shixiong Zhu <shixiong@databricks.com> | 2016-05-03 11:42:47 -0700 |
commit | 439e361010e51d2213c92ccabed5093be92a72ee (patch) | |
tree | 40a965988a20a4e9228bd0185f7df1695e6b71b0 /python/pyspark | |
parent | ca813330c716bed76ac0034c12f56665960a1105 (diff) | |
download | spark-439e361010e51d2213c92ccabed5093be92a72ee.tar.gz spark-439e361010e51d2213c92ccabed5093be92a72ee.tar.bz2 spark-439e361010e51d2213c92ccabed5093be92a72ee.zip |
[SPARK-9819][STREAMING][DOCUMENTATION] Clarify doc for invReduceFunc in incremental versions of reduceByWindow
- that reduceFunc and invReduceFunc should be associative
- that the intermediate result in iterated applications of inverseReduceFunc
is its first argument
Author: François Garillot <francois@garillot.net>
Closes #8103 from huitseeker/issue/invReduceFuncDoc.
Diffstat (limited to 'python/pyspark')
-rw-r--r-- | python/pyspark/streaming/dstream.py | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/python/pyspark/streaming/dstream.py b/python/pyspark/streaming/dstream.py index 2056663872..67a0819601 100644 --- a/python/pyspark/streaming/dstream.py +++ b/python/pyspark/streaming/dstream.py @@ -454,7 +454,9 @@ class DStream(object): This is more efficient than `invReduceFunc` is None. @param reduceFunc: associative and commutative reduce function - @param invReduceFunc: inverse reduce function of `reduceFunc` + @param invReduceFunc: inverse reduce function of `reduceFunc`; such that for all y, + and invertible x: + `invReduceFunc(reduceFunc(x, y), x) = y` @param windowDuration: width of the window; must be a multiple of this DStream's batching interval @param slideDuration: sliding interval of the window (i.e., the interval after which |