aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/rdd.py
diff options
context:
space:
mode:
authorDan McClary <dan.mcclary@gmail.com>2014-03-18 00:45:47 -0700
committerMatei Zaharia <matei@databricks.com>2014-03-18 00:45:47 -0700
commite3681f26fae7e87321ac991f5a0fb7517415803a (patch)
treeecb920edc1484d1a3d7bda193546f1c71984e0a4 /python/pyspark/rdd.py
parent087eedca32fd87bfe1629588091bd307d45e4a7c (diff)
downloadspark-e3681f26fae7e87321ac991f5a0fb7517415803a.tar.gz
spark-e3681f26fae7e87321ac991f5a0fb7517415803a.tar.bz2
spark-e3681f26fae7e87321ac991f5a0fb7517415803a.zip
Spark 1246 add min max to stat counter
Here's the addition of min and max to statscounter.py and min and max methods to rdd.py. Author: Dan McClary <dan.mcclary@gmail.com> Closes #144 from dwmclary/SPARK-1246-add-min-max-to-stat-counter and squashes the following commits: fd3fd4b [Dan McClary] fixed error, updated test 82cde0e [Dan McClary] flipped incorrectly assigned inf values in StatCounter 5d96799 [Dan McClary] added max and min to StatCounter repr for pyspark 21dd366 [Dan McClary] added max and min to StatCounter output, updated doc 1a97558 [Dan McClary] added max and min to StatCounter output, updated doc a5c13b0 [Dan McClary] Added min and max to Scala and Java RDD, added min and max to StatCounter ed67136 [Dan McClary] broke min/max out into separate transaction, added to rdd.py 1e7056d [Dan McClary] added underscore to getBucket 37a7dea [Dan McClary] cleaned up boundaries for histogram -- uses real min/max when buckets are derived 29981f2 [Dan McClary] fixed indentation on doctest comment eaf89d9 [Dan McClary] added correct doctest for histogram 4916016 [Dan McClary] added histogram method, added max and min to statscounter
Diffstat (limited to 'python/pyspark/rdd.py')
-rw-r--r--python/pyspark/rdd.py19
1 files changed, 19 insertions, 0 deletions
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index f3b432ff24..ae09dbff02 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -571,7 +571,26 @@ class RDD(object):
return reduce(op, vals, zeroValue)
# TODO: aggregate
+
+
+ def max(self):
+ """
+ Find the maximum item in this RDD.
+
+ >>> sc.parallelize([1.0, 5.0, 43.0, 10.0]).max()
+ 43.0
+ """
+ return self.reduce(max)
+ def min(self):
+ """
+ Find the maximum item in this RDD.
+
+ >>> sc.parallelize([1.0, 5.0, 43.0, 10.0]).min()
+ 1.0
+ """
+ return self.reduce(min)
+
def sum(self):
"""
Add up the elements in this RDD.