From 406f6d3070441962222f6a25449ea2c48f52ce88 Mon Sep 17 00:00:00 2001 From: Sandy Ryza Date: Wed, 28 Jan 2015 12:41:23 -0800 Subject: SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs Author: Sandy Ryza Closes #4251 from sryza/sandy-spark-5458 and squashes the following commits: 460827a [Sandy Ryza] Python too d2dc160 [Sandy Ryza] SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs --- docs/programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs/programming-guide.md') diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 2443fc29b4..6486614e71 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -886,7 +886,7 @@ for details. groupByKey([numTasks]) When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs.
Note: If you are grouping in order to perform an aggregation (such as a sum or - average) over each key, using reduceByKey or combineByKey will yield much better + average) over each key, using reduceByKey or aggregateByKey will yield much better performance.
Note: By default, the level of parallelism in the output depends on the number of partitions of the parent RDD. -- cgit v1.2.3