From ab6dd80ba0f7e1042ea270d10400109a467fe40e Mon Sep 17 00:00:00 2001 From: Sean Owen Date: Tue, 30 Sep 2014 11:15:38 -0700 Subject: [SPARK-3356] [DOCS] Document when RDD elements' ordering within partitions is nondeterministic As suggested by mateiz , and because it came up on the mailing list again last week, this attempts to document that ordering of elements is not guaranteed across RDD evaluations in groupBy, zip, and partition-wise RDD methods. Suggestions welcome about the wording, or other methods that need a note. Author: Sean Owen Closes #2508 from srowen/SPARK-3356 and squashes the following commits: b7c96fd [Sean Owen] Undo change to programming guide ad4aeec [Sean Owen] Don't mention ordering in partition-wise methods, reword description of ordering for zip methods per review, and add similar note to programming guide, which mentions groupByKey (but not zip methods) fce943b [Sean Owen] Note that ordering of elements is not guaranteed across RDD evaluations in groupBy, zip, and partition-wise RDD methods --- docs/programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs/programming-guide.md') diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 510b47a2aa..1d61a3c555 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -883,7 +883,7 @@ for details. groupByKey([numTasks]) When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs.
- Note: If you are grouping in order to perform an aggregation (such as a sum or + Note: If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using reduceByKey or combineByKey will yield much better performance.
-- cgit v1.2.3