aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorAli Ghodsi <alig@cs.berkeley.edu>2014-06-15 23:44:30 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-06-15 23:44:40 -0700
commit078d503ce649b6d36c93b40cdfee9adf5b85578b (patch)
tree69c4b4757905c5cc2f78e27f5d9982d00bb9199b /docs
parentb1c2199bff05b648654fc7219329bae48a91551b (diff)
downloadspark-078d503ce649b6d36c93b40cdfee9adf5b85578b.tar.gz
spark-078d503ce649b6d36c93b40cdfee9adf5b85578b.tar.bz2
spark-078d503ce649b6d36c93b40cdfee9adf5b85578b.zip
Updating docs to include missing information about reducers and clarify ...
...how the OFFHEAP storage level works (there has been confusion around this). Author: Ali Ghodsi <alig@cs.berkeley.edu> Closes #1089 from alig/master and squashes the following commits: ca8114d [Ali Ghodsi] Updating docs to include missing information about reducers and clarify how the OFFHEAP storage level works (there has been confusion around this). (cherry picked from commit 119b06a04f6df3949b3b074a18f791bbc732ac31) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
Diffstat (limited to 'docs')
-rw-r--r--docs/programming-guide.md7
1 files changed, 5 insertions, 2 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 8d4c6b1148..494fd8e0fa 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -821,7 +821,7 @@ for details.
</tr>
<tr>
<td> <b>reduceByKey</b>(<i>func</i>, [<i>numTasks</i>]) </td>
- <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td>
+ <td> When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function <i>func</i>, which must be of type (V,V) => V. Like in <code>groupByKey</code>, the number of reduce tasks is configurable through an optional second argument. </td>
</tr>
<tr>
<td> <b>sortByKey</b>([<i>ascending</i>], [<i>numTasks</i>]) </td>
@@ -985,7 +985,10 @@ storage levels is:
<td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>.
Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors
to be smaller and to share a pool of memory, making it attractive in environments with
- large heaps or multiple concurrent applications.
+ large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon,
+ the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory
+ in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts
+ from memory.
</td>
</tr>
</table>