[SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column

## What changes were proposed in this pull request? Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column ## How was this patch tested? manual, unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16739 from felixcheung/rcoalesce.
author: Felix Cheung <felixcheung_m@hotmail.com> 2017-02-15 10:45:37 -0800
committer: Felix Cheung <felixcheung@apache.org> 2017-02-15 10:45:37 -0800
commit: 671bc08ed502815bfa2254c30d64149402acb0c7 (patch)
tree: 3edcf2548e8f58a6a27db9c16050a3ff1d8ae261 /python
parent: c97f4e17de0ce39e8172a5a4ae81f1914816a358 (diff)
download: spark-671bc08ed502815bfa2254c30d64149402acb0c7.tar.gz
spark-671bc08ed502815bfa2254c30d64149402acb0c7.tar.bz2
spark-671bc08ed502815bfa2254c30d64149402acb0c7.zip
1 files changed, 9 insertions, 1 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 188808b431..70efeaf016 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -515,7 +515,15 @@ class DataFrame(object):
         Similar to coalesce defined on an :class:`RDD`, this operation results in a
         narrow dependency, e.g. if you go from 1000 partitions to 100 partitions,
         there will not be a shuffle, instead each of the 100 new partitions will
-        claim 10 of the current partitions.
+        claim 10 of the current partitions. If a larger number of partitions is requested,
+        it will stay at the current number of partitions.
+
+        However, if you're doing a drastic coalesce, e.g. to numPartitions = 1,
+        this may result in your computation taking place on fewer nodes than
+        you like (e.g. one node in the case of numPartitions = 1). To avoid this,
+        you can call repartition(). This will add a shuffle step, but means the
+        current upstream partitions will be executed in parallel (per whatever
+        the current partitioning is).
 
         >>> df.coalesce(1).rdd.getNumPartitions()
         1
author	Felix Cheung <felixcheung_m@hotmail.com>	2017-02-15 10:45:37 -0800
committer	Felix Cheung <felixcheung@apache.org>	2017-02-15 10:45:37 -0800
commit	671bc08ed502815bfa2254c30d64149402acb0c7 (patch)
tree	3edcf2548e8f58a6a27db9c16050a3ff1d8ae261 /python
parent	c97f4e17de0ce39e8172a5a4ae81f1914816a358 (diff)
download	spark-671bc08ed502815bfa2254c30d64149402acb0c7.tar.gz spark-671bc08ed502815bfa2254c30d64149402acb0c7.tar.bz2 spark-671bc08ed502815bfa2254c30d64149402acb0c7.zip