Merge pull request #227 from JoshRosen/fix/distinct_numsplits

Allow controlling number of splits in distinct().
author: Matei Zaharia <matei@eecs.berkeley.edu> 2012-09-28 23:57:24 -0700
committer: Matei Zaharia <matei@eecs.berkeley.edu> 2012-09-28 23:57:24 -0700
commit: 2f11e3c285499880b9d800fdd65ea9ad1c82b4af (patch)
tree: f6c36434b4a7517c1e5bf9eb64de0aa36b6ff87c /docs
parent: 56dcad593641ef8de211fcb4303574a9f4509f89 (diff)
parent: 8654165e692d881c38e7d7e342974ba766452741 (diff)
download: spark-2f11e3c285499880b9d800fdd65ea9ad1c82b4af.tar.gz
spark-2f11e3c285499880b9d800fdd65ea9ad1c82b4af.tar.bz2
spark-2f11e3c285499880b9d800fdd65ea9ad1c82b4af.zip
1 files changed, 4 insertions, 0 deletions
diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md
index a370bf3ddc..db761d7df1 100644
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@@ -148,6 +148,10 @@ The following tables list the transformations and actions currently supported (s
   <td> Return a new dataset that contains the union of the elements in the source dataset and the argument. </td>
 </tr>
 <tr>
+  <td> <b>distinct</b>([<i>numTasks</i>])) </td>
+  <td> Return a new dataset that contains the distinct elements of the source dataset.</td>
+</tr>
+<tr>
   <td> <b>groupByKey</b>([<i>numTasks</i>]) </td>
   <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, Seq[V]) pairs. <br />
 <b>Note:</b> By default, this uses only 8 parallel tasks to do the grouping. You can pass an optional <code>numTasks</code> argument to set a different number of tasks.
author	Matei Zaharia <matei@eecs.berkeley.edu>	2012-09-28 23:57:24 -0700
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2012-09-28 23:57:24 -0700
commit	2f11e3c285499880b9d800fdd65ea9ad1c82b4af (patch)
tree	f6c36434b4a7517c1e5bf9eb64de0aa36b6ff87c /docs
parent	56dcad593641ef8de211fcb4303574a9f4509f89 (diff)
parent	8654165e692d881c38e7d7e342974ba766452741 (diff)
download	spark-2f11e3c285499880b9d800fdd65ea9ad1c82b4af.tar.gz spark-2f11e3c285499880b9d800fdd65ea9ad1c82b4af.tar.bz2 spark-2f11e3c285499880b9d800fdd65ea9ad1c82b4af.zip