diff options
author | Sandy Ryza <sandy@cloudera.com> | 2014-06-12 08:14:25 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-06-12 08:14:25 -0700 |
commit | ce92a9c18f033ac9fa2f12143fab00a90e0f4577 (patch) | |
tree | 34507639ff2f876630c6868619844c52e13d3720 /docs | |
parent | 43d53d51c9ee2626d9de91faa3b192979b86821d (diff) | |
download | spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.tar.gz spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.tar.bz2 spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.zip |
SPARK-554. Add aggregateByKey.
Author: Sandy Ryza <sandy@cloudera.com>
Closes #705 from sryza/sandy-spark-554 and squashes the following commits:
2302b8f [Sandy Ryza] Add MIMA exclude
f52e0ad [Sandy Ryza] Fix Python tests for real
2f3afa3 [Sandy Ryza] Fix Python test
0b735e9 [Sandy Ryza] Fix line lengths
ae56746 [Sandy Ryza] Fix doc (replace T with V)
c2be415 [Sandy Ryza] Java and Python aggregateByKey
23bf400 [Sandy Ryza] SPARK-554. Add aggregateByKey.
Diffstat (limited to 'docs')
-rw-r--r-- | docs/programming-guide.md | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 7989e02dfb..79784682bf 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -891,6 +891,10 @@ for details. <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td> </tr> <tr> + <td> <b>aggregateByKey</b>(<i>zeroValue</i>)(<i>seqOp</i>, <i>combOp</i>, [<i>numTasks</i>]) </td> + <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, U) pairs where the values for each key are aggregated using the given combine functions and a neutral "zero" value. Allows an aggregated value type that is different than the input value type, while avoiding unnecessary allocations. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td> +</tr> +<tr> <td> <b>sortByKey</b>([<i>ascending</i>], [<i>numTasks</i>]) </td> <td> When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean <code>ascending</code> argument.</td> </tr> |