diff options
author | asokadiggs <asoka.diggs@intel.com> | 2015-09-29 17:45:18 -0400 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2015-09-29 17:45:18 -0400 |
commit | c1ad373f26053e1906fce7681c03d130a642bf33 (patch) | |
tree | 22e7a2411c6422c19e5daf258d39cdac40f22bd8 /python/pyspark/sql | |
parent | 7d399c9daa6769ab234890c551e1b3456e0e6e85 (diff) | |
download | spark-c1ad373f26053e1906fce7681c03d130a642bf33.tar.gz spark-c1ad373f26053e1906fce7681c03d130a642bf33.tar.bz2 spark-c1ad373f26053e1906fce7681c03d130a642bf33.zip |
[SPARK-10782] [PYTHON] Update dropDuplicates documentation
Documentation for dropDuplicates() and drop_duplicates() is one and the same. Resolved the error in the example for drop_duplicates using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases.
Author: asokadiggs <asoka.diggs@intel.com>
Closes #8930 from asokadiggs/jira-10782.
Diffstat (limited to 'python/pyspark/sql')
-rw-r--r-- | python/pyspark/sql/dataframe.py | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index b09422aade..033b31983f 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -931,6 +931,8 @@ class DataFrame(object): """Return a new :class:`DataFrame` with duplicate rows removed, optionally only considering certain columns. + :func:`drop_duplicates` is an alias for :func:`dropDuplicates`. + >>> from pyspark.sql import Row >>> df = sc.parallelize([ \ Row(name='Alice', age=5, height=80), \ |