diff options
author | Michael Armbrust <michael@databricks.com> | 2015-11-03 13:02:17 +0100 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2015-11-03 13:02:17 +0100 |
commit | b86f2cab67989f09ba1ba8604e52cd4b1e44e436 (patch) | |
tree | ca3d89522afcb113823115e10704f52771abc09f /yarn | |
parent | 425ff03f5ac4f3ddda1ba06656e620d5426f4209 (diff) | |
download | spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.tar.gz spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.tar.bz2 spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.zip |
[SPARK-11404] [SQL] Support for groupBy using column expressions
This PR adds a new method `groupBy(cols: Column*)` to `Dataset` that allows users to group using column expressions instead of a lambda function. Since the return type of these expressions is not known at compile time, we just set the key type as a generic `Row`. If the user would like to work the key in a type-safe way, they can call `grouped.asKey[Type]`, which is also added in this PR.
```scala
val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 1)).toDS()
val grouped = ds.groupBy($"_1").asKey[String]
val agged = grouped.mapGroups { case (g, iter) =>
Iterator((g, iter.map(_._2).sum))
}
agged.collect()
res0: Array(("a", 30), ("b", 3), ("c", 1))
```
Author: Michael Armbrust <michael@databricks.com>
Closes #9359 from marmbrus/columnGroupBy and squashes the following commits:
bbcb03b [Michael Armbrust] Update DatasetSuite.scala
8fd2908 [Michael Armbrust] Update DatasetSuite.scala
0b0e2f8 [Michael Armbrust] [SPARK-11404] [SQL] Support for groupBy using column expressions
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions