diff options
author | Ali Ghodsi <alig@cs.berkeley.edu> | 2014-06-15 23:44:30 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-06-15 23:44:30 -0700 |
commit | 119b06a04f6df3949b3b074a18f791bbc732ac31 (patch) | |
tree | a2b051542468f430df4bec65f4c15ddef13ead8b /docs/programming-guide.md | |
parent | 9672ee07fb1c3583c70f23a699de3b2282eb0f98 (diff) | |
download | spark-119b06a04f6df3949b3b074a18f791bbc732ac31.tar.gz spark-119b06a04f6df3949b3b074a18f791bbc732ac31.tar.bz2 spark-119b06a04f6df3949b3b074a18f791bbc732ac31.zip |
Updating docs to include missing information about reducers and clarify ...
...how the OFFHEAP storage level works (there has been confusion around this).
Author: Ali Ghodsi <alig@cs.berkeley.edu>
Closes #1089 from alig/master and squashes the following commits:
ca8114d [Ali Ghodsi] Updating docs to include missing information about reducers and clarify how the OFFHEAP storage level works (there has been confusion around this).
Diffstat (limited to 'docs/programming-guide.md')
-rw-r--r-- | docs/programming-guide.md | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 0b24a8b88b..65d75b85ef 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -899,7 +899,7 @@ for details. </tr> <tr> <td> <b>reduceByKey</b>(<i>func</i>, [<i>numTasks</i>]) </td> - <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td> + <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function <i>func</i>, which must be of type (V,V) => V. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td> </tr> <tr> <td> <b>aggregateByKey</b>(<i>zeroValue</i>)(<i>seqOp</i>, <i>combOp</i>, [<i>numTasks</i>]) </td> @@ -1067,7 +1067,10 @@ storage levels is: <td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>. Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors to be smaller and to share a pool of memory, making it attractive in environments with - large heaps or multiple concurrent applications. + large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon, + the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory + in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts + from memory. </td> </tr> </table> |