diff options
author | Sandy Ryza <sandy@cloudera.com> | 2015-01-28 12:41:23 -0800 |
---|---|---|
committer | Patrick Wendell <patrick@databricks.com> | 2015-01-28 12:41:23 -0800 |
commit | 406f6d3070441962222f6a25449ea2c48f52ce88 (patch) | |
tree | 13b32a67cdcf1b55423cb1f17ee96ca4a960c7bf /docs/programming-guide.md | |
parent | c8e934ef3cd06f02f9a2946e96a1a52293c22490 (diff) | |
download | spark-406f6d3070441962222f6a25449ea2c48f52ce88.tar.gz spark-406f6d3070441962222f6a25449ea2c48f52ce88.tar.bz2 spark-406f6d3070441962222f6a25449ea2c48f52ce88.zip |
SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs
Author: Sandy Ryza <sandy@cloudera.com>
Closes #4251 from sryza/sandy-spark-5458 and squashes the following commits:
460827a [Sandy Ryza] Python too
d2dc160 [Sandy Ryza] SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs
Diffstat (limited to 'docs/programming-guide.md')
-rw-r--r-- | docs/programming-guide.md | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 2443fc29b4..6486614e71 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -886,7 +886,7 @@ for details. <td> <b>groupByKey</b>([<i>numTasks</i>]) </td> <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs. <br /> <b>Note:</b> If you are grouping in order to perform an aggregation (such as a sum or - average) over each key, using <code>reduceByKey</code> or <code>combineByKey</code> will yield much better + average) over each key, using <code>reduceByKey</code> or <code>aggregateByKey</code> will yield much better performance. <br /> <b>Note:</b> By default, the level of parallelism in the output depends on the number of partitions of the parent RDD. |