aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
authorMortada Mehyar <mortada.mehyar@gmail.com>2016-06-10 00:23:34 -0700
committerReynold Xin <rxin@databricks.com>2016-06-10 00:23:34 -0700
commit675a73715d3c8adb9d9a9dce5f76a2db5106790c (patch)
treee4e59c6ec43027a6ed802a9bcd0b307e113df3be /docs/sql-programming-guide.md
parent00c310133df4f3893dd90d801168c2ab9841b102 (diff)
downloadspark-675a73715d3c8adb9d9a9dce5f76a2db5106790c.tar.gz
spark-675a73715d3c8adb9d9a9dce5f76a2db5106790c.tar.bz2
spark-675a73715d3c8adb9d9a9dce5f76a2db5106790c.zip
[DOCUMENTATION] fixed groupby aggregation example for pyspark
## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()` after the fix here's how I tested it: ``` In [1]: from pyspark.sql import Row In [2]: import pyspark.sql.functions as func In [3]: %cpaste Pasting code; enter '--' alone on the line to stop or use Ctrl-D. :records = [{'age': 19, 'department': 1, 'expense': 100}, : {'age': 20, 'department': 1, 'expense': 200}, : {'age': 21, 'department': 2, 'expense': 300}, : {'age': 22, 'department': 2, 'expense': 300}, : {'age': 23, 'department': 3, 'expense': 300}] :-- In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records]) In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show() +----------+----------+--------+------------+ |department|department|max(age)|sum(expense)| +----------+----------+--------+------------+ | 1| 1| 20| 300| | 2| 2| 22| 600| | 3| 3| 23| 300| +----------+----------+--------+------------+ Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13587 from mortada/groupby_agg_doc_fix.
Diffstat (limited to 'docs/sql-programming-guide.md')
-rw-r--r--docs/sql-programming-guide.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 940c1d7704..efdf873c34 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -2221,7 +2221,7 @@ import pyspark.sql.functions as func
# In 1.3.x, in order for the grouping column "department" to show up,
# it must be included explicitly as part of the agg function call.
-df.groupBy("department").agg("department"), func.max("age"), func.sum("expense"))
+df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense"))
# In 1.4+, grouping column "department" is included automatically.
df.groupBy("department").agg(func.max("age"), func.sum("expense"))