aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/sql/dataframe.py
diff options
context:
space:
mode:
authorReynold Xin <rxin@databricks.com>2015-07-01 21:14:13 -0700
committerReynold Xin <rxin@databricks.com>2015-07-01 21:14:13 -0700
commit9fd13d5613b6d16a78d97d4798f085b56107d343 (patch)
tree9687bc3c9da9a72e5ae3814972f5a72c0bb7181f /python/pyspark/sql/dataframe.py
parent3a342dedc04799948bf6da69843bd1a91202ffe5 (diff)
downloadspark-9fd13d5613b6d16a78d97d4798f085b56107d343.tar.gz
spark-9fd13d5613b6d16a78d97d4798f085b56107d343.tar.bz2
spark-9fd13d5613b6d16a78d97d4798f085b56107d343.zip
[SPARK-8770][SQL] Create BinaryOperator abstract class.
Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression. This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression. Author: Reynold Xin <rxin@databricks.com> Closes #7174 from rxin/binary-opterator and squashes the following commits: f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class. fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator d8518cf [Reynold Xin] Updated Python tests.
Diffstat (limited to 'python/pyspark/sql/dataframe.py')
-rw-r--r--python/pyspark/sql/dataframe.py10
1 files changed, 5 insertions, 5 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 273a40dd52..1e9c657cf8 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -802,11 +802,11 @@ class DataFrame(object):
Each element should be a column name (string) or an expression (:class:`Column`).
>>> df.groupBy().avg().collect()
- [Row(AVG(age)=3.5)]
+ [Row(avg(age)=3.5)]
>>> df.groupBy('name').agg({'age': 'mean'}).collect()
- [Row(name=u'Alice', AVG(age)=2.0), Row(name=u'Bob', AVG(age)=5.0)]
+ [Row(name=u'Alice', avg(age)=2.0), Row(name=u'Bob', avg(age)=5.0)]
>>> df.groupBy(df.name).avg().collect()
- [Row(name=u'Alice', AVG(age)=2.0), Row(name=u'Bob', AVG(age)=5.0)]
+ [Row(name=u'Alice', avg(age)=2.0), Row(name=u'Bob', avg(age)=5.0)]
>>> df.groupBy(['name', df.age]).count().collect()
[Row(name=u'Bob', age=5, count=1), Row(name=u'Alice', age=2, count=1)]
"""
@@ -864,10 +864,10 @@ class DataFrame(object):
(shorthand for ``df.groupBy.agg()``).
>>> df.agg({"age": "max"}).collect()
- [Row(MAX(age)=5)]
+ [Row(max(age)=5)]
>>> from pyspark.sql import functions as F
>>> df.agg(F.min(df.age)).collect()
- [Row(MIN(age)=2)]
+ [Row(min(age)=2)]
"""
return self.groupBy().agg(*exprs)