aboutsummaryrefslogtreecommitdiff
path: root/yarn
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2015-11-03 13:02:17 +0100
committerMichael Armbrust <michael@databricks.com>2015-11-03 13:02:17 +0100
commitb86f2cab67989f09ba1ba8604e52cd4b1e44e436 (patch)
treeca3d89522afcb113823115e10704f52771abc09f /yarn
parent425ff03f5ac4f3ddda1ba06656e620d5426f4209 (diff)
downloadspark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.tar.gz
spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.tar.bz2
spark-b86f2cab67989f09ba1ba8604e52cd4b1e44e436.zip
[SPARK-11404] [SQL] Support for groupBy using column expressions
This PR adds a new method `groupBy(cols: Column*)` to `Dataset` that allows users to group using column expressions instead of a lambda function. Since the return type of these expressions is not known at compile time, we just set the key type as a generic `Row`. If the user would like to work the key in a type-safe way, they can call `grouped.asKey[Type]`, which is also added in this PR. ```scala val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 1)).toDS() val grouped = ds.groupBy($"_1").asKey[String] val agged = grouped.mapGroups { case (g, iter) => Iterator((g, iter.map(_._2).sum)) } agged.collect() res0: Array(("a", 30), ("b", 3), ("c", 1)) ``` Author: Michael Armbrust <michael@databricks.com> Closes #9359 from marmbrus/columnGroupBy and squashes the following commits: bbcb03b [Michael Armbrust] Update DatasetSuite.scala 8fd2908 [Michael Armbrust] Update DatasetSuite.scala 0b0e2f8 [Michael Armbrust] [SPARK-11404] [SQL] Support for groupBy using column expressions
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions