SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs

Author: Sandy Ryza <sandy@cloudera.com> Closes #4251 from sryza/sandy-spark-5458 and squashes the following commits: 460827a [Sandy Ryza] Python too d2dc160 [Sandy Ryza] SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs
author: Sandy Ryza <sandy@cloudera.com> 2015-01-28 12:41:23 -0800
committer: Patrick Wendell <patrick@databricks.com> 2015-01-28 12:41:23 -0800
commit: 406f6d3070441962222f6a25449ea2c48f52ce88 (patch)
tree: 13b32a67cdcf1b55423cb1f17ee96ca4a960c7bf /python
parent: c8e934ef3cd06f02f9a2946e96a1a52293c22490 (diff)
download: spark-406f6d3070441962222f6a25449ea2c48f52ce88.tar.gz
spark-406f6d3070441962222f6a25449ea2c48f52ce88.tar.bz2
spark-406f6d3070441962222f6a25449ea2c48f52ce88.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index f4cfe4845d..efd2f35912 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -1634,8 +1634,8 @@ class RDD(object):
         Hash-partitions the resulting RDD with into numPartitions partitions.
 
         Note: If you are grouping in order to perform an aggregation (such as a
-        sum or average) over each key, using reduceByKey will provide much
-        better performance.
+        sum or average) over each key, using reduceByKey or aggregateByKey will
+        provide much better performance.
 
         >>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
         >>> map((lambda (x,y): (x, list(y))), sorted(x.groupByKey().collect()))
author	Sandy Ryza <sandy@cloudera.com>	2015-01-28 12:41:23 -0800
committer	Patrick Wendell <patrick@databricks.com>	2015-01-28 12:41:23 -0800
commit	406f6d3070441962222f6a25449ea2c48f52ce88 (patch)
tree	13b32a67cdcf1b55423cb1f17ee96ca4a960c7bf /python
parent	c8e934ef3cd06f02f9a2946e96a1a52293c22490 (diff)
download	spark-406f6d3070441962222f6a25449ea2c48f52ce88.tar.gz spark-406f6d3070441962222f6a25449ea2c48f52ce88.tar.bz2 spark-406f6d3070441962222f6a25449ea2c48f52ce88.zip