diff options
author | Reynold Xin <rxin@databricks.com> | 2015-05-11 18:07:12 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-05-11 18:07:12 -0700 |
commit | 3a9b6997df3fef1052d8c410f32319018c52acff (patch) | |
tree | 8e5d13c68e929737bbed48119576fb1571a31d64 /docs/sql-programming-guide.md | |
parent | 57255dcd794222f4db5df1e549ebc7b896cebfdc (diff) | |
download | spark-3a9b6997df3fef1052d8c410f32319018c52acff.tar.gz spark-3a9b6997df3fef1052d8c410f32319018c52acff.tar.bz2 spark-3a9b6997df3fef1052d8c410f32319018c52acff.zip |
[SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
Author: Reynold Xin <rxin@databricks.com>
Closes #6062 from rxin/agg-retain-doc and squashes the following commits:
43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r-- | docs/sql-programming-guide.md | 60 |
1 files changed, 59 insertions, 1 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 6af10432b9..6b7b867ea6 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1594,6 +1594,64 @@ options. # Migration Guide +## Upgrading from Spark SQL 1.3 to 1.4 + +Based on user feedback, we changed the default behavior of `DataFrame.groupBy().agg()` to retain the grouping columns in the resulting `DataFrame`. To keep the behavior in 1.3, set `spark.sql.retainGroupColumns` to `false`. + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> +{% highlight scala %} + +// In 1.3.x, in order for the grouping column "department" to show up, +// it must be included explicitly as part of the agg function call. +df.groupBy("department").agg($"department", max("age"), sum("expense")) + +// In 1.4+, grouping column "department" is included automatically. +df.groupBy("department").agg(max("age"), sum("expense")) + +// Revert to 1.3 behavior (not retaining grouping column) by: +sqlContext.setConf("spark.sql.retainGroupColumns", "false") + +{% endhighlight %} +</div> + +<div data-lang="java" markdown="1"> +{% highlight java %} + +// In 1.3.x, in order for the grouping column "department" to show up, +// it must be included explicitly as part of the agg function call. +df.groupBy("department").agg(col("department"), max("age"), sum("expense")); + +// In 1.4+, grouping column "department" is included automatically. +df.groupBy("department").agg(max("age"), sum("expense")); + +// Revert to 1.3 behavior (not retaining grouping column) by: +sqlContext.setConf("spark.sql.retainGroupColumns", "false"); + +{% endhighlight %} +</div> + +<div data-lang="python" markdown="1"> +{% highlight python %} + +import pyspark.sql.functions as func + +# In 1.3.x, in order for the grouping column "department" to show up, +# it must be included explicitly as part of the agg function call. +df.groupBy("department").agg("department"), func.max("age"), func.sum("expense")) + +# In 1.4+, grouping column "department" is included automatically. +df.groupBy("department").agg(func.max("age"), func.sum("expense")) + +# Revert to 1.3.x behavior (not retaining grouping column) by: +sqlContext.setConf("spark.sql.retainGroupColumns", "false") + +{% endhighlight %} +</div> + +</div> + + ## Upgrading from Spark SQL 1.0-1.2 to 1.3 In Spark 1.3 we removed the "Alpha" label from Spark SQL and as part of this did a cleanup of the @@ -1651,7 +1709,7 @@ moved into the udf object in `SQLContext`. <div class="codetabs"> <div data-lang="scala" markdown="1"> -{% highlight java %} +{% highlight scala %} sqlContext.udf.register("strLen", (s: String) => s.length()) |