[SPARK-11489][SQL] Only include common first order statistics in GroupedData

We added a bunch of higher order statistics such as skewness and kurtosis to GroupedData. I don't think they are common enough to justify being listed, since users can always use the normal statistics aggregate functions. That is to say, after this change, we won't support ```scala df.groupBy("key").kurtosis("colA", "colB") ``` However, we will still support ```scala df.groupBy("key").agg(kurtosis(col("colA")), kurtosis(col("colB"))) ``` Author: Reynold Xin <rxin@databricks.com> Closes #9446 from rxin/SPARK-11489.
author: Reynold Xin <rxin@databricks.com> 2015-11-03 16:27:56 -0800
committer: Reynold Xin <rxin@databricks.com> 2015-11-03 16:27:56 -0800
commit: 5051262d4ca6a2c529c9b1ba86d54cce60a7af17 (patch)
tree: f7c89be1ccc400a803aaa136926b84405a7e43e1 /python/pyspark/sql
parent: 53e9cee3e4e845d1f875c487215c0f22503347b1 (diff)
download: spark-5051262d4ca6a2c529c9b1ba86d54cce60a7af17.tar.gz
spark-5051262d4ca6a2c529c9b1ba86d54cce60a7af17.tar.bz2
spark-5051262d4ca6a2c529c9b1ba86d54cce60a7af17.zip
1 files changed, 0 insertions, 88 deletions
diff --git a/python/pyspark/sql/group.py b/python/pyspark/sql/group.py
index 946b53e71c..71c0bccc5e 100644
--- a/python/pyspark/sql/group.py
+++ b/python/pyspark/sql/group.py
@@ -167,94 +167,6 @@ class GroupedData(object):
         [Row(sum(age)=7, sum(height)=165)]
         """
 
-    @df_varargs_api
-    @since(1.6)
-    def stddev(self, *cols):
-        """Compute the sample standard deviation for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().stddev('age', 'height').collect()
-        [Row(STDDEV(age)=2.12..., STDDEV(height)=3.53...)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def stddev_samp(self, *cols):
-        """Compute the sample standard deviation for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().stddev_samp('age', 'height').collect()
-        [Row(STDDEV_SAMP(age)=2.12..., STDDEV_SAMP(height)=3.53...)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def stddev_pop(self, *cols):
-        """Compute the population standard deviation for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().stddev_pop('age', 'height').collect()
-        [Row(STDDEV_POP(age)=1.5, STDDEV_POP(height)=2.5)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def variance(self, *cols):
-        """Compute the sample variance for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().variance('age', 'height').collect()
-        [Row(VARIANCE(age)=2.25, VARIANCE(height)=6.25)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def var_pop(self, *cols):
-        """Compute the sample variance for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().var_pop('age', 'height').collect()
-        [Row(VAR_POP(age)=2.25, VAR_POP(height)=6.25)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def var_samp(self, *cols):
-        """Compute the sample variance for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().var_samp('age', 'height').collect()
-        [Row(VAR_SAMP(age)=4.5, VAR_SAMP(height)=12.5)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def skewness(self, *cols):
-        """Compute the skewness for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().skewness('age', 'height').collect()
-        [Row(SKEWNESS(age)=0.0, SKEWNESS(height)=0.0)]
-        """
-
-    @df_varargs_api
-    @since(1.6)
-    def kurtosis(self, *cols):
-        """Compute the kurtosis for each numeric columns for each group.
-
-        :param cols: list of column names (string). Non-numeric columns are ignored.
-
-        >>> df3.groupBy().kurtosis('age', 'height').collect()
-        [Row(KURTOSIS(age)=-2.0, KURTOSIS(height)=-2.0)]
-        """
-
 
 def _test():
     import doctest
author	Reynold Xin <rxin@databricks.com>	2015-11-03 16:27:56 -0800
committer	Reynold Xin <rxin@databricks.com>	2015-11-03 16:27:56 -0800
commit	5051262d4ca6a2c529c9b1ba86d54cce60a7af17 (patch)
tree	f7c89be1ccc400a803aaa136926b84405a7e43e1 /python/pyspark/sql
parent	53e9cee3e4e845d1f875c487215c0f22503347b1 (diff)
download	spark-5051262d4ca6a2c529c9b1ba86d54cce60a7af17.tar.gz spark-5051262d4ca6a2c529c9b1ba86d54cce60a7af17.tar.bz2 spark-5051262d4ca6a2c529c9b1ba86d54cce60a7af17.zip