SPARK-554. Add aggregateByKey.

Author: Sandy Ryza <sandy@cloudera.com> Closes #705 from sryza/sandy-spark-554 and squashes the following commits: 2302b8f [Sandy Ryza] Add MIMA exclude f52e0ad [Sandy Ryza] Fix Python tests for real 2f3afa3 [Sandy Ryza] Fix Python test 0b735e9 [Sandy Ryza] Fix line lengths ae56746 [Sandy Ryza] Fix doc (replace T with V) c2be415 [Sandy Ryza] Java and Python aggregateByKey 23bf400 [Sandy Ryza] SPARK-554. Add aggregateByKey.
author: Sandy Ryza <sandy@cloudera.com> 2014-06-12 08:14:25 -0700
committer: Patrick Wendell <pwendell@gmail.com> 2014-06-12 08:14:25 -0700
commit: ce92a9c18f033ac9fa2f12143fab00a90e0f4577 (patch)
tree: 34507639ff2f876630c6868619844c52e13d3720 /docs
parent: 43d53d51c9ee2626d9de91faa3b192979b86821d (diff)
download: spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.tar.gz
spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.tar.bz2
spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.zip
1 files changed, 4 insertions, 0 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 7989e02dfb..79784682bf 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -891,6 +891,10 @@ for details.
   <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td>
 </tr>
 <tr>
+  <td> <b>aggregateByKey</b>(<i>zeroValue</i>)(<i>seqOp</i>, <i>combOp</i>, [<i>numTasks</i>]) </td>
+  <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, U) pairs where the values for each key are aggregated using the given combine functions and a neutral "zero" value. Allows an aggregated value type that is different than the input value type, while avoiding unnecessary allocations. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td>
+</tr>
+<tr>
   <td> <b>sortByKey</b>([<i>ascending</i>], [<i>numTasks</i>]) </td>
   <td> When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean <code>ascending</code> argument.</td>
 </tr>
author	Sandy Ryza <sandy@cloudera.com>	2014-06-12 08:14:25 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2014-06-12 08:14:25 -0700
commit	ce92a9c18f033ac9fa2f12143fab00a90e0f4577 (patch)
tree	34507639ff2f876630c6868619844c52e13d3720 /docs
parent	43d53d51c9ee2626d9de91faa3b192979b86821d (diff)
download	spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.tar.gz spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.tar.bz2 spark-ce92a9c18f033ac9fa2f12143fab00a90e0f4577.zip