aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/rdd.py
diff options
context:
space:
mode:
Diffstat (limited to 'python/pyspark/rdd.py')
-rw-r--r--python/pyspark/rdd.py18
1 files changed, 9 insertions, 9 deletions
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 6afe769662..0508235c1c 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -1027,20 +1027,20 @@ class RDD(object):
If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
this can be switched from an O(log n) inseration to O(1) per
- element(where n = # buckets).
+ element (where n is the number of buckets).
- Buckets must be sorted and not contain any duplicates, must be
+ Buckets must be sorted, not contain any duplicates, and have
at least two elements.
- If `buckets` is a number, it will generates buckets which are
+ If `buckets` is a number, it will generate buckets which are
evenly spaced between the minimum and maximum of the RDD. For
- example, if the min value is 0 and the max is 100, given buckets
- as 2, the resulting buckets will be [0,50) [50,100]. buckets must
- be at least 1 If the RDD contains infinity, NaN throws an exception
- If the elements in RDD do not vary (max == min) always returns
- a single bucket.
+ example, if the min value is 0 and the max is 100, given `buckets`
+ as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must
+ be at least 1. An exception is raised if the RDD contains infinity.
+ If the elements in the RDD do not vary (max == min), a single bucket
+ will be used.
- It will return a tuple of buckets and histogram.
+ The return value is a tuple of buckets and histogram.
>>> rdd = sc.parallelize(range(51))
>>> rdd.histogram(2)