aboutsummaryrefslogtreecommitdiff
path: root/docs/programming-guide.md
diff options
context:
space:
mode:
authorReynold Xin <rxin@databricks.com>2016-02-26 22:35:12 -0800
committerReynold Xin <rxin@databricks.com>2016-02-26 22:35:12 -0800
commit59e3e10be2f9a1c53979ca72c038adb4fa17ca64 (patch)
tree3d6b2246738484273d36d0ccbec66b733930a3e0 /docs/programming-guide.md
parentf77dc4e1e202942aa8393fb5d8f492863973fe17 (diff)
downloadspark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.tar.gz
spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.tar.bz2
spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.zip
[SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts
## What changes were proposed in this pull request? We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts. Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #11400 from rxin/release-script.
Diffstat (limited to 'docs/programming-guide.md')
-rw-r--r--docs/programming-guide.md22
1 files changed, 2 insertions, 20 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 5ebafa40b0..2f0ed5eca2 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1177,7 +1177,7 @@ that originally created it.
In addition, each persisted RDD can be stored using a different *storage level*, allowing you, for example,
to persist the dataset on disk, persist it in memory but as serialized Java objects (to save space),
-replicate it across nodes, or store it off-heap in [Tachyon](http://tachyon-project.org/).
+replicate it across nodes.
These levels are set by passing a
`StorageLevel` object ([Scala](api/scala/index.html#org.apache.spark.storage.StorageLevel),
[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
@@ -1218,24 +1218,11 @@ storage levels is:
<td> MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. </td>
<td> Same as the levels above, but replicate each partition on two cluster nodes. </td>
</tr>
-<tr>
- <td> OFF_HEAP (experimental) </td>
- <td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>.
- Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors
- to be smaller and to share a pool of memory, making it attractive in environments with
- large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon,
- the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory
- in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts
- from memory. If you plan to use Tachyon as the off heap store, Spark is compatible with Tachyon
- out-of-the-box. Please refer to this <a href="http://tachyon-project.org/master/Running-Spark-on-Tachyon.html">page</a>
- for the suggested version pairings.
- </td>
-</tr>
</table>
**Note:** *In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library,
so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`,
-`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`.*
+`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.*
Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it.
@@ -1259,11 +1246,6 @@ requests from a web application). *All* the storage levels provide full fault to
recomputing lost data, but the replicated ones let you continue running tasks on the RDD without
waiting to recompute a lost partition.
-* In environments with high amounts of memory or multiple applications, the experimental `OFF_HEAP`
-mode has several advantages:
- * It allows multiple executors to share the same pool of memory in Tachyon.
- * It significantly reduces garbage collection costs.
- * Cached data is not lost if individual executors crash.
### Removing Data