aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authormbonaci <mbonaci@gmail.com>2015-03-20 18:30:45 +0000
committerSean Owen <sowen@cloudera.com>2015-03-20 18:33:53 +0000
commit28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060 (patch)
tree24985af7a3e26e1c852e3e9615d4eb188ff78f08 /python
parentdb4d317ccfdd9bd1dc7e8beac54ebcc35966b7d5 (diff)
downloadspark-28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060.tar.gz
spark-28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060.tar.bz2
spark-28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060.zip
[SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample
The docs for the `sample` method were insufficient, now less so. Author: mbonaci <mbonaci@gmail.com> Closes #5097 from mbonaci/master and squashes the following commits: a6a9d97 [mbonaci] [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method
Diffstat (limited to 'python')
-rw-r--r--python/pyspark/rdd.py6
1 files changed, 6 insertions, 0 deletions
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index bf17f513c0..c337a43c8a 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -346,6 +346,12 @@ class RDD(object):
"""
Return a sampled subset of this RDD.
+ :param withReplacement: can elements be sampled multiple times (replaced when sampled out)
+ :param fraction: expected size of the sample as a fraction of this RDD's size
+ without replacement: probability that each element is chosen; fraction must be [0, 1]
+ with replacement: expected number of times each element is chosen; fraction must be >= 0
+ :param seed: seed for the random number generator
+
>>> rdd = sc.parallelize(range(100), 4)
>>> rdd.sample(False, 0.1, 81).count()
10