diff options
author | Madhu Siddalingaiah <madhu@madhu.com> | 2014-12-01 08:45:34 -0800 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2014-12-01 08:45:34 -0800 |
commit | 2b233f5fc4beb2c6ed4bc142e923e96f8bad3ec4 (patch) | |
tree | e820137691ec4b188a5d3c8c55cfe8fe91783989 | |
parent | 30a86acdefd5428af6d6264f59a037e0eefd74b4 (diff) | |
download | spark-2b233f5fc4beb2c6ed4bc142e923e96f8bad3ec4.tar.gz spark-2b233f5fc4beb2c6ed4bc142e923e96f8bad3ec4.tar.bz2 spark-2b233f5fc4beb2c6ed4bc142e923e96f8bad3ec4.zip |
Documentation: add description for repartitionAndSortWithinPartitions
Author: Madhu Siddalingaiah <madhu@madhu.com>
Closes #3390 from msiddalingaiah/master and squashes the following commits:
cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
-rw-r--r-- | docs/programming-guide.md | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 7a16ee8742..5e0d5c15d7 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -934,6 +934,12 @@ for details. <td> Reshuffle the data in the RDD randomly to create either more or fewer partitions and balance it across them. This always shuffles all data over the network. </td> </tr> +<tr> + <td> <b>repartitionAndSortWithinPartitions</b>(<i>partitioner</i>) </td> + <td> Repartition the RDD according to the given partitioner and, within each resulting partition, + sort records by their keys. This is more efficient than calling <code>repartition</code> and then sorting within + each partition because it can push the sorting down into the shuffle machinery. </td> +</tr> </table> ### Actions |