diff options
author | anabranch <wac.chambers@gmail.com> | 2016-11-17 11:34:55 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-11-17 11:34:55 +0000 |
commit | 49b6f456aca350e9e2c170782aa5cc75e7822680 (patch) | |
tree | 3a13f932b73feeab6b01f1d039728758203edcf0 /python/pyspark/sql/dataframe.py | |
parent | a3cac7bd86a6fe8e9b42da1bf580aaeb59378304 (diff) | |
download | spark-49b6f456aca350e9e2c170782aa5cc75e7822680.tar.gz spark-49b6f456aca350e9e2c170782aa5cc75e7822680.tar.bz2 spark-49b6f456aca350e9e2c170782aa5cc75e7822680.zip |
[SPARK-18365][DOCS] Improve Sample Method Documentation
## What changes were proposed in this pull request?
I found the documentation for the sample method to be confusing, this adds more clarification across all languages.
- [x] Scala
- [x] Python
- [x] R
- [x] RDD Scala
- [ ] RDD Python with SEED
- [X] RDD Java
- [x] RDD Java with SEED
- [x] RDD Python
## How was this patch tested?
NA
Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
Author: anabranch <wac.chambers@gmail.com>
Author: Bill Chambers <bill@databricks.com>
Closes #15815 from anabranch/SPARK-18365.
Diffstat (limited to 'python/pyspark/sql/dataframe.py')
-rw-r--r-- | python/pyspark/sql/dataframe.py | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 29710acf54..3899890083 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -549,6 +549,11 @@ class DataFrame(object): def sample(self, withReplacement, fraction, seed=None): """Returns a sampled subset of this :class:`DataFrame`. + .. note:: + + This is not guaranteed to provide exactly the fraction specified of the total count + of the given :class:`DataFrame`. + >>> df.sample(False, 0.5, 42).count() 2 """ |