aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorNong Li <nongli@gmail.com>2015-11-01 14:32:21 -0800
committerYin Huai <yhuai@databricks.com>2015-11-01 14:34:06 -0800
commit046e32ed8467e0f46ffeca1a95d4d40017eb5bdb (patch)
tree4b981c567dec32cd088d277541f63ae3cdd7b647 /core
parentdc7e399fc01e74f2ba28ebd945785cc0f7759ccd (diff)
downloadspark-046e32ed8467e0f46ffeca1a95d4d40017eb5bdb.tar.gz
spark-046e32ed8467e0f46ffeca1a95d4d40017eb5bdb.tar.bz2
spark-046e32ed8467e0f46ffeca1a95d4d40017eb5bdb.zip
[SPARK-11410][SQL] Add APIs to provide functionality similar to Hive's DISTRIBUTE BY and SORT BY.
DISTRIBUTE BY allows the user to hash partition the data by specified exprs. It also allows for optioning sorting within each resulting partition. There is no required relationship between the exprs for partitioning and sorting (i.e. one does not need to be a prefix of the other). This patch adds to APIs to DataFrames which can be used together to provide this functionality: 1. distributeBy() which partitions the data frame into a specified number of partitions using the partitioning exprs. 2. localSort() which sorts each partition using the provided sorting exprs. To get the DISTRIBUTE BY functionality, the user simply does: df.distributeBy(...).localSort(...) Author: Nong Li <nongli@gmail.com> Closes #9364 from nongli/spark-11410.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions