aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@eecs.berkeley.edu>2012-09-28 23:44:19 -0700
committerJosh Rosen <joshrosen@eecs.berkeley.edu>2012-09-28 23:44:19 -0700
commit37c199bbb098c68efecb4f8bd10b5cb8dfd9da3b (patch)
treeb5cc28279860bbe82667dcd7e73deba2ff36f60b /docs
parent9f6efbf06a65953c4fcabd439124d71d50c5df6e (diff)
downloadspark-37c199bbb098c68efecb4f8bd10b5cb8dfd9da3b.tar.gz
spark-37c199bbb098c68efecb4f8bd10b5cb8dfd9da3b.tar.bz2
spark-37c199bbb098c68efecb4f8bd10b5cb8dfd9da3b.zip
Allow controlling number of splits in distinct().
Diffstat (limited to 'docs')
-rw-r--r--docs/scala-programming-guide.md4
1 files changed, 4 insertions, 0 deletions
diff --git a/docs/scala-programming-guide.md b/docs/scala-programming-guide.md
index a370bf3ddc..db761d7df1 100644
--- a/docs/scala-programming-guide.md
+++ b/docs/scala-programming-guide.md
@@ -148,6 +148,10 @@ The following tables list the transformations and actions currently supported (s
<td> Return a new dataset that contains the union of the elements in the source dataset and the argument. </td>
</tr>
<tr>
+ <td> <b>distinct</b>([<i>numTasks</i>])) </td>
+ <td> Return a new dataset that contains the distinct elements of the source dataset.</td>
+</tr>
+<tr>
<td> <b>groupByKey</b>([<i>numTasks</i>]) </td>
<td> When called on a dataset of (K, V) pairs, returns a dataset of (K, Seq[V]) pairs. <br />
<b>Note:</b> By default, this uses only 8 parallel tasks to do the grouping. You can pass an optional <code>numTasks</code> argument to set a different number of tasks.