Merge pull request #83 from ewencp/pyspark-accumulator-add-method - spark

diff options

author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-10-19 23:40:40 -0700
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-10-19 23:40:40 -0700
commit	747f53892546eb8edf1c75788c9ecf9c939d375c (patch)
tree	0cc31f0653e2bac5282f568dc6eab9632c347b3b /core/src/main/java/org
parent	6511bbe2adeb5e361fb3c31bbda245eeb890647a (diff)
parent	7eaa56de7f0253869fa85d4366f1048386af477e (diff)
download	spark-747f53892546eb8edf1c75788c9ecf9c939d375c.tar.gz spark-747f53892546eb8edf1c75788c9ecf9c939d375c.tar.bz2 spark-747f53892546eb8edf1c75788c9ecf9c939d375c.zip

Merge pull request #83 from ewencp/pyspark-accumulator-add-method

Add an add() method to pyspark accumulators. Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).

Diffstat (limited to 'core/src/main/java/org')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: