[SPARK-18365][DOCS] Improve Sample Method Documentation

## What changes were proposed in this pull request? I found the documentation for the sample method to be confusing, this adds more clarification across all languages. - [x] Scala - [x] Python - [x] R - [x] RDD Scala - [ ] RDD Python with SEED - [X] RDD Java - [x] RDD Java with SEED - [x] RDD Python ## How was this patch tested? NA Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <bill@databricks.com> Closes #15815 from anabranch/SPARK-18365.
author: anabranch <wac.chambers@gmail.com> 2016-11-17 11:34:55 +0000
committer: Sean Owen <sowen@cloudera.com> 2016-11-17 11:34:55 +0000
commit: 49b6f456aca350e9e2c170782aa5cc75e7822680 (patch)
tree: 3a13f932b73feeab6b01f1d039728758203edcf0 /python/pyspark/sql/dataframe.py
parent: a3cac7bd86a6fe8e9b42da1bf580aaeb59378304 (diff)
download: spark-49b6f456aca350e9e2c170782aa5cc75e7822680.tar.gz
spark-49b6f456aca350e9e2c170782aa5cc75e7822680.tar.bz2
spark-49b6f456aca350e9e2c170782aa5cc75e7822680.zip
1 files changed, 5 insertions, 0 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 29710acf54..3899890083 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -549,6 +549,11 @@ class DataFrame(object):
     def sample(self, withReplacement, fraction, seed=None):
         """Returns a sampled subset of this :class:`DataFrame`.
 
+        .. note::
+
+            This is not guaranteed to provide exactly the fraction specified of the total count
+            of the given :class:`DataFrame`.
+
         >>> df.sample(False, 0.5, 42).count()
         2
         """
author	anabranch <wac.chambers@gmail.com>	2016-11-17 11:34:55 +0000
committer	Sean Owen <sowen@cloudera.com>	2016-11-17 11:34:55 +0000
commit	49b6f456aca350e9e2c170782aa5cc75e7822680 (patch)
tree	3a13f932b73feeab6b01f1d039728758203edcf0 /python/pyspark/sql/dataframe.py
parent	a3cac7bd86a6fe8e9b42da1bf580aaeb59378304 (diff)
download	spark-49b6f456aca350e9e2c170782aa5cc75e7822680.tar.gz spark-49b6f456aca350e9e2c170782aa5cc75e7822680.tar.bz2 spark-49b6f456aca350e9e2c170782aa5cc75e7822680.zip