[SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.

Author: Reynold Xin <rxin@databricks.com> Closes #6062 from rxin/agg-retain-doc and squashes the following commits: 43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
author: Reynold Xin <rxin@databricks.com> 2015-05-11 18:07:12 -0700
committer: Reynold Xin <rxin@databricks.com> 2015-05-11 18:07:12 -0700
commit: 3a9b6997df3fef1052d8c410f32319018c52acff (patch)
tree: 8e5d13c68e929737bbed48119576fb1571a31d64 /docs/sql-programming-guide.md
parent: 57255dcd794222f4db5df1e549ebc7b896cebfdc (diff)
download: spark-3a9b6997df3fef1052d8c410f32319018c52acff.tar.gz
spark-3a9b6997df3fef1052d8c410f32319018c52acff.tar.bz2
spark-3a9b6997df3fef1052d8c410f32319018c52acff.zip
1 files changed, 59 insertions, 1 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 6af10432b9..6b7b867ea6 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1594,6 +1594,64 @@ options.
 
 # Migration Guide
 
+## Upgrading from Spark SQL 1.3 to 1.4
+
+Based on user feedback, we changed the default behavior of `DataFrame.groupBy().agg()` to retain the grouping columns in the resulting `DataFrame`. To keep the behavior in 1.3, set `spark.sql.retainGroupColumns` to `false`.
+
+<div class="codetabs">
+<div data-lang="scala"  markdown="1">
+{% highlight scala %}
+
+// In 1.3.x, in order for the grouping column "department" to show up,
+// it must be included explicitly as part of the agg function call.
+df.groupBy("department").agg($"department", max("age"), sum("expense"))
+
+// In 1.4+, grouping column "department" is included automatically.
+df.groupBy("department").agg(max("age"), sum("expense"))
+
+// Revert to 1.3 behavior (not retaining grouping column) by:
+sqlContext.setConf("spark.sql.retainGroupColumns", "false")
+
+{% endhighlight %}
+</div>
+
+<div data-lang="java"  markdown="1">
+{% highlight java %}
+
+// In 1.3.x, in order for the grouping column "department" to show up,
+// it must be included explicitly as part of the agg function call.
+df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
+
+// In 1.4+, grouping column "department" is included automatically.
+df.groupBy("department").agg(max("age"), sum("expense"));
+
+// Revert to 1.3 behavior (not retaining grouping column) by:
+sqlContext.setConf("spark.sql.retainGroupColumns", "false");
+
+{% endhighlight %}
+</div>
+
+<div data-lang="python"  markdown="1">
+{% highlight python %}
+
+import pyspark.sql.functions as func
+
+# In 1.3.x, in order for the grouping column "department" to show up,
+# it must be included explicitly as part of the agg function call.
+df.groupBy("department").agg("department"), func.max("age"), func.sum("expense"))
+
+# In 1.4+, grouping column "department" is included automatically.
+df.groupBy("department").agg(func.max("age"), func.sum("expense"))
+
+# Revert to 1.3.x behavior (not retaining grouping column) by:
+sqlContext.setConf("spark.sql.retainGroupColumns", "false")
+
+{% endhighlight %}
+</div>
+
+</div>
+
+
 ## Upgrading from Spark SQL 1.0-1.2 to 1.3
 
 In Spark 1.3 we removed the "Alpha" label from Spark SQL and as part of this did a cleanup of the
@@ -1651,7 +1709,7 @@ moved into the udf object in `SQLContext`.
 
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
-{% highlight java %}
+{% highlight scala %}
 
 sqlContext.udf.register("strLen", (s: String) => s.length())
author	Reynold Xin <rxin@databricks.com>	2015-05-11 18:07:12 -0700
committer	Reynold Xin <rxin@databricks.com>	2015-05-11 18:07:12 -0700
commit	3a9b6997df3fef1052d8c410f32319018c52acff (patch)
tree	8e5d13c68e929737bbed48119576fb1571a31d64 /docs/sql-programming-guide.md
parent	57255dcd794222f4db5df1e549ebc7b896cebfdc (diff)
download	spark-3a9b6997df3fef1052d8c410f32319018c52acff.tar.gz spark-3a9b6997df3fef1052d8c410f32319018c52acff.tar.bz2 spark-3a9b6997df3fef1052d8c410f32319018c52acff.zip