diff options
author | mbonaci <mbonaci@gmail.com> | 2015-03-20 18:30:45 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2015-03-20 18:33:53 +0000 |
commit | 28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060 (patch) | |
tree | 24985af7a3e26e1c852e3e9615d4eb188ff78f08 /python/pyspark/rdd.py | |
parent | db4d317ccfdd9bd1dc7e8beac54ebcc35966b7d5 (diff) | |
download | spark-28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060.tar.gz spark-28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060.tar.bz2 spark-28bcb9e9e86a4b643fbf96b2b7e03928ebcfc060.zip |
[SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample
The docs for the `sample` method were insufficient, now less so.
Author: mbonaci <mbonaci@gmail.com>
Closes #5097 from mbonaci/master and squashes the following commits:
a6a9d97 [mbonaci] [SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample method
Diffstat (limited to 'python/pyspark/rdd.py')
-rw-r--r-- | python/pyspark/rdd.py | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index bf17f513c0..c337a43c8a 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -346,6 +346,12 @@ class RDD(object): """ Return a sampled subset of this RDD. + :param withReplacement: can elements be sampled multiple times (replaced when sampled out) + :param fraction: expected size of the sample as a fraction of this RDD's size + without replacement: probability that each element is chosen; fraction must be [0, 1] + with replacement: expected number of times each element is chosen; fraction must be >= 0 + :param seed: seed for the random number generator + >>> rdd = sc.parallelize(range(100), 4) >>> rdd.sample(False, 0.1, 81).count() 10 |