[SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts

## What changes were proposed in this pull request? We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts. Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #11400 from rxin/release-script.
author: Reynold Xin <rxin@databricks.com> 2016-02-26 22:35:12 -0800
committer: Reynold Xin <rxin@databricks.com> 2016-02-26 22:35:12 -0800
commit: 59e3e10be2f9a1c53979ca72c038adb4fa17ca64 (patch)
tree: 3d6b2246738484273d36d0ccbec66b733930a3e0 /docs/programming-guide.md
parent: f77dc4e1e202942aa8393fb5d8f492863973fe17 (diff)
download: spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.tar.gz
spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.tar.bz2
spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.zip
1 files changed, 2 insertions, 20 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 5ebafa40b0..2f0ed5eca2 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1177,7 +1177,7 @@ that originally created it.
 
 In addition, each persisted RDD can be stored using a different *storage level*, allowing you, for example,
 to persist the dataset on disk, persist it in memory but as serialized Java objects (to save space),
-replicate it across nodes, or store it off-heap in [Tachyon](http://tachyon-project.org/).
+replicate it across nodes.
 These levels are set by passing a
 `StorageLevel` object ([Scala](api/scala/index.html#org.apache.spark.storage.StorageLevel),
 [Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
@@ -1218,24 +1218,11 @@ storage levels is:
   <td> MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.  </td>
   <td> Same as the levels above, but replicate each partition on two cluster nodes. </td>
 </tr>
-<tr>
-  <td> OFF_HEAP (experimental) </td>
-  <td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>.
-    Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors
-    to be smaller and to share a pool of memory, making it attractive in environments with
-    large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon,
-    the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory
-    in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts
-    from memory. If you plan to use Tachyon as the off heap store, Spark is compatible with Tachyon
-    out-of-the-box. Please refer to this <a href="http://tachyon-project.org/master/Running-Spark-on-Tachyon.html">page</a>
-    for the suggested version pairings.
-  </td>
-</tr>
 </table>
 
 **Note:** *In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, 
 so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, 
-`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`.*
+`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.*
 
 Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it.
 
@@ -1259,11 +1246,6 @@ requests from a web application). *All* the storage levels provide full fault to
 recomputing lost data, but the replicated ones let you continue running tasks on the RDD without
 waiting to recompute a lost partition.
 
-* In environments with high amounts of memory or multiple applications, the experimental `OFF_HEAP`
-mode has several advantages:
-   * It allows multiple executors to share the same pool of memory in Tachyon.
-   * It significantly reduces garbage collection costs.
-   * Cached data is not lost if individual executors crash.
 
 ### Removing Data
author	Reynold Xin <rxin@databricks.com>	2016-02-26 22:35:12 -0800
committer	Reynold Xin <rxin@databricks.com>	2016-02-26 22:35:12 -0800
commit	59e3e10be2f9a1c53979ca72c038adb4fa17ca64 (patch)
tree	3d6b2246738484273d36d0ccbec66b733930a3e0 /docs/programming-guide.md
parent	f77dc4e1e202942aa8393fb5d8f492863973fe17 (diff)
download	spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.tar.gz spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.tar.bz2 spark-59e3e10be2f9a1c53979ca72c038adb4fa17ca64.zip