[SPARK-11404] [SQL] Support for groupBy using column expressions - spark

diff options

author	Michael Armbrust <michael@databricks.com>	2015-11-03 13:02:17 +0100
committer	Michael Armbrust <michael@databricks.com>	2015-11-03 13:02:17 +0100
commit	b86f2cab67989f09ba1ba8604e52cd4b1e44e436 (patch)
tree	ca3d89522afcb113823115e10704f52771abc09f /yarn
parent	425ff03f5ac4f3ddda1ba06656e620d5426f4209 (diff)
download	spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.tar.gz spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.tar.bz2 spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.zip

[SPARK-11404] [SQL] Support for groupBy using column expressions

This PR adds a new method `groupBy(cols: Column*)` to `Dataset` that allows users to group using column expressions instead of a lambda function. Since the return type of these expressions is not known at compile time, we just set the key type as a generic `Row`. If the user would like to work the key in a type-safe way, they can call `grouped.asKey[Type]`, which is also added in this PR. ```scala val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 1)).toDS() val grouped = ds.groupBy($"_1").asKey[String] val agged = grouped.mapGroups { case (g, iter) => Iterator((g, iter.map(_._2).sum)) } agged.collect() res0: Array(("a", 30), ("b", 3), ("c", 1)) ``` Author: Michael Armbrust <michael@databricks.com> Closes #9359 from marmbrus/columnGroupBy and squashes the following commits: bbcb03b [Michael Armbrust] Update DatasetSuite.scala 8fd2908 [Michael Armbrust] Update DatasetSuite.scala 0b0e2f8 [Michael Armbrust] [SPARK-11404] [SQL] Support for groupBy using column expressions

Diffstat (limited to 'yarn')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: