diff options
author | Reynold Xin <rxin@databricks.com> | 2016-03-19 11:23:14 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-03-19 11:23:14 -0700 |
commit | dcaa016610ac2c11d7dd01803f3515b02ab32e64 (patch) | |
tree | 7d03000193cdcc5100fd7198e143680b2e5882e5 /core | |
parent | 2082a49569cb5d900e318af9da1027821dfe93bc (diff) | |
download | spark-dcaa016610ac2c11d7dd01803f3515b02ab32e64.tar.gz spark-dcaa016610ac2c11d7dd01803f3515b02ab32e64.tar.bz2 spark-dcaa016610ac2c11d7dd01803f3515b02ab32e64.zip |
[SPARK-13897][SQL] RelationalGroupedDataset and KeyValueGroupedDataset
## What changes were proposed in this pull request?
Previously, Dataset.groupBy returns a GroupedData, and Dataset.groupByKey returns a GroupedDataset. The naming is very similar, and unfortunately does not convey the real differences between the two.
Assume we are grouping by some keys (K). groupByKey is a key-value style group by, in which the schema of the returned dataset is a tuple of just two fields: key and value. groupBy, on the other hand, is a relational style group by, in which the schema of the returned dataset is flattened and contain |K| + |V| fields.
This pull request also removes the experimental tag from RelationalGroupedDataset. It has been with DataFrame since 1.3, and we have enough confidence now to stabilize it.
## How was this patch tested?
This is a rename to improve API understandability. Should be covered by all existing tests.
Author: Reynold Xin <rxin@databricks.com>
Closes #11841 from rxin/SPARK-13897.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions