aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/configuration.md24
-rw-r--r--docs/job-scheduling.md3
-rw-r--r--docs/programming-guide.md22
-rwxr-xr-xmake-distribution.sh50
-rwxr-xr-xsbin/start-all.sh15
-rwxr-xr-xsbin/start-master.sh21
-rwxr-xr-xsbin/start-slaves.sh22
-rwxr-xr-xsbin/stop-master.sh4
-rwxr-xr-xsbin/stop-slaves.sh5
9 files changed, 6 insertions, 160 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 4b1b00720b..e9b66238bd 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -929,30 +929,6 @@ Apart from these, the following properties are also available, and may be useful
mapping has high overhead for blocks close to or below the page size of the operating system.
</td>
</tr>
-<tr>
- <td><code>spark.externalBlockStore.blockManager</code></td>
- <td>org.apache.spark.storage.TachyonBlockManager</td>
- <td>
- Implementation of external block manager (file system) that store RDDs. The file system's URL is set by
- <code>spark.externalBlockStore.url</code>.
- </td>
-</tr>
-<tr>
- <td><code>spark.externalBlockStore.baseDir</code></td>
- <td>System.getProperty("java.io.tmpdir")</td>
- <td>
- Directories of the external block store that store RDDs. The file system's URL is set by
- <code>spark.externalBlockStore.url</code> It can also be a comma-separated list of multiple
- directories on Tachyon file system.
- </td>
-</tr>
-<tr>
- <td><code>spark.externalBlockStore.url</code></td>
- <td>tachyon://localhost:19998 for Tachyon</td>
- <td>
- The URL of the underlying external blocker file system in the external block store.
- </td>
-</tr>
</table>
#### Networking
diff --git a/docs/job-scheduling.md b/docs/job-scheduling.md
index 95d47794ea..00b6a18836 100644
--- a/docs/job-scheduling.md
+++ b/docs/job-scheduling.md
@@ -54,8 +54,7 @@ an application to gain back cores on one node when it has work to do. To use thi
Note that none of the modes currently provide memory sharing across applications. If you would like to share
data this way, we recommend running a single server application that can serve multiple requests by querying
-the same RDDs. In future releases, in-memory storage systems such as [Tachyon](http://tachyon-project.org) will
-provide another approach to share RDDs.
+the same RDDs.
## Dynamic Resource Allocation
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 5ebafa40b0..2f0ed5eca2 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1177,7 +1177,7 @@ that originally created it.
In addition, each persisted RDD can be stored using a different *storage level*, allowing you, for example,
to persist the dataset on disk, persist it in memory but as serialized Java objects (to save space),
-replicate it across nodes, or store it off-heap in [Tachyon](http://tachyon-project.org/).
+replicate it across nodes.
These levels are set by passing a
`StorageLevel` object ([Scala](api/scala/index.html#org.apache.spark.storage.StorageLevel),
[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
@@ -1218,24 +1218,11 @@ storage levels is:
<td> MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. </td>
<td> Same as the levels above, but replicate each partition on two cluster nodes. </td>
</tr>
-<tr>
- <td> OFF_HEAP (experimental) </td>
- <td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>.
- Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors
- to be smaller and to share a pool of memory, making it attractive in environments with
- large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon,
- the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory
- in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts
- from memory. If you plan to use Tachyon as the off heap store, Spark is compatible with Tachyon
- out-of-the-box. Please refer to this <a href="http://tachyon-project.org/master/Running-Spark-on-Tachyon.html">page</a>
- for the suggested version pairings.
- </td>
-</tr>
</table>
**Note:** *In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library,
so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`,
-`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`.*
+`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.*
Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it.
@@ -1259,11 +1246,6 @@ requests from a web application). *All* the storage levels provide full fault to
recomputing lost data, but the replicated ones let you continue running tasks on the RDD without
waiting to recompute a lost partition.
-* In environments with high amounts of memory or multiple applications, the experimental `OFF_HEAP`
-mode has several advantages:
- * It allows multiple executors to share the same pool of memory in Tachyon.
- * It significantly reduces garbage collection costs.
- * Cached data is not lost if individual executors crash.
### Removing Data
diff --git a/make-distribution.sh b/make-distribution.sh
index 327659298e..20998144f0 100755
--- a/make-distribution.sh
+++ b/make-distribution.sh
@@ -32,11 +32,6 @@ set -x
SPARK_HOME="$(cd "`dirname "$0"`"; pwd)"
DISTDIR="$SPARK_HOME/dist"
-SPARK_TACHYON=false
-TACHYON_VERSION="0.8.2"
-TACHYON_TGZ="tachyon-${TACHYON_VERSION}-bin.tar.gz"
-TACHYON_URL="http://tachyon-project.org/downloads/files/${TACHYON_VERSION}/${TACHYON_TGZ}"
-
MAKE_TGZ=false
NAME=none
MVN="$SPARK_HOME/build/mvn"
@@ -45,7 +40,7 @@ function exit_with_usage {
echo "make-distribution.sh - tool for making binary distributions of Spark"
echo ""
echo "usage:"
- cl_options="[--name] [--tgz] [--mvn <mvn-command>] [--with-tachyon]"
+ cl_options="[--name] [--tgz] [--mvn <mvn-command>]"
echo "./make-distribution.sh $cl_options <maven build options>"
echo "See Spark's \"Building Spark\" doc for correct Maven options."
echo ""
@@ -69,9 +64,6 @@ while (( "$#" )); do
echo "Error: '--with-hive' is no longer supported, use Maven options -Phive and -Phive-thriftserver"
exit_with_usage
;;
- --with-tachyon)
- SPARK_TACHYON=true
- ;;
--tgz)
MAKE_TGZ=true
;;
@@ -150,12 +142,6 @@ else
echo "Making distribution for Spark $VERSION in $DISTDIR..."
fi
-if [ "$SPARK_TACHYON" == "true" ]; then
- echo "Tachyon Enabled"
-else
- echo "Tachyon Disabled"
-fi
-
# Build uber fat JAR
cd "$SPARK_HOME"
@@ -219,40 +205,6 @@ if [ -d "$SPARK_HOME"/R/lib/SparkR ]; then
cp "$SPARK_HOME/R/lib/sparkr.zip" "$DISTDIR"/R/lib
fi
-# Download and copy in tachyon, if requested
-if [ "$SPARK_TACHYON" == "true" ]; then
- TMPD=`mktemp -d 2>/dev/null || mktemp -d -t 'disttmp'`
-
- pushd "$TMPD" > /dev/null
- echo "Fetching tachyon tgz"
-
- TACHYON_DL="${TACHYON_TGZ}.part"
- if [ $(command -v curl) ]; then
- curl --silent -k -L "${TACHYON_URL}" > "${TACHYON_DL}" && mv "${TACHYON_DL}" "${TACHYON_TGZ}"
- elif [ $(command -v wget) ]; then
- wget --quiet "${TACHYON_URL}" -O "${TACHYON_DL}" && mv "${TACHYON_DL}" "${TACHYON_TGZ}"
- else
- printf "You do not have curl or wget installed. please install Tachyon manually.\n"
- exit -1
- fi
-
- tar xzf "${TACHYON_TGZ}"
- cp "tachyon-${TACHYON_VERSION}/assembly/target/tachyon-assemblies-${TACHYON_VERSION}-jar-with-dependencies.jar" "$DISTDIR/lib"
- mkdir -p "$DISTDIR/tachyon/src/main/java/tachyon/web"
- cp -r "tachyon-${TACHYON_VERSION}"/{bin,conf,libexec} "$DISTDIR/tachyon"
- cp -r "tachyon-${TACHYON_VERSION}"/servers/src/main/java/tachyon/web "$DISTDIR/tachyon/src/main/java/tachyon/web"
-
- if [[ `uname -a` == Darwin* ]]; then
- # need to run sed differently on osx
- nl=$'\n'; sed -i "" -e "s|export TACHYON_JAR=\$TACHYON_HOME/target/\(.*\)|# This is set for spark's make-distribution\\$nl export TACHYON_JAR=\$TACHYON_HOME/../lib/\1|" "$DISTDIR/tachyon/libexec/tachyon-config.sh"
- else
- sed -i "s|export TACHYON_JAR=\$TACHYON_HOME/target/\(.*\)|# This is set for spark's make-distribution\n export TACHYON_JAR=\$TACHYON_HOME/../lib/\1|" "$DISTDIR/tachyon/libexec/tachyon-config.sh"
- fi
-
- popd > /dev/null
- rm -rf "$TMPD"
-fi
-
if [ "$MAKE_TGZ" == "true" ]; then
TARDIR_NAME=spark-$VERSION-bin-$NAME
TARDIR="$SPARK_HOME/$TARDIR_NAME"
diff --git a/sbin/start-all.sh b/sbin/start-all.sh
index 6217f9bf28..a5d30d274e 100755
--- a/sbin/start-all.sh
+++ b/sbin/start-all.sh
@@ -25,22 +25,11 @@ if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
-TACHYON_STR=""
-
-while (( "$#" )); do
-case $1 in
- --with-tachyon)
- TACHYON_STR="--with-tachyon"
- ;;
- esac
-shift
-done
-
# Load the Spark configuration
. "${SPARK_HOME}/sbin/spark-config.sh"
# Start Master
-"${SPARK_HOME}/sbin"/start-master.sh $TACHYON_STR
+"${SPARK_HOME}/sbin"/start-master.sh
# Start Workers
-"${SPARK_HOME}/sbin"/start-slaves.sh $TACHYON_STR
+"${SPARK_HOME}/sbin"/start-slaves.sh
diff --git a/sbin/start-master.sh b/sbin/start-master.sh
index 9f2e14dff6..ce7f177959 100755
--- a/sbin/start-master.sh
+++ b/sbin/start-master.sh
@@ -39,21 +39,6 @@ fi
ORIGINAL_ARGS="$@"
-START_TACHYON=false
-
-while (( "$#" )); do
-case $1 in
- --with-tachyon)
- if [ ! -e "${SPARK_HOME}"/tachyon/bin/tachyon ]; then
- echo "Error: --with-tachyon specified, but tachyon not found."
- exit -1
- fi
- START_TACHYON=true
- ;;
- esac
-shift
-done
-
. "${SPARK_HOME}/sbin/spark-config.sh"
. "${SPARK_HOME}/bin/load-spark-env.sh"
@@ -73,9 +58,3 @@ fi
"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
--ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \
$ORIGINAL_ARGS
-
-if [ "$START_TACHYON" == "true" ]; then
- "${SPARK_HOME}"/tachyon/bin/tachyon bootstrap-conf $SPARK_MASTER_IP
- "${SPARK_HOME}"/tachyon/bin/tachyon format -s
- "${SPARK_HOME}"/tachyon/bin/tachyon-start.sh master
-fi
diff --git a/sbin/start-slaves.sh b/sbin/start-slaves.sh
index 51ca81e053..5bf2b83b42 100755
--- a/sbin/start-slaves.sh
+++ b/sbin/start-slaves.sh
@@ -23,21 +23,6 @@ if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
-START_TACHYON=false
-
-while (( "$#" )); do
-case $1 in
- --with-tachyon)
- if [ ! -e "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon ]; then
- echo "Error: --with-tachyon specified, but tachyon not found."
- exit -1
- fi
- START_TACHYON=true
- ;;
- esac
-shift
-done
-
. "${SPARK_HOME}/sbin/spark-config.sh"
. "${SPARK_HOME}/bin/load-spark-env.sh"
@@ -50,12 +35,5 @@ if [ "$SPARK_MASTER_IP" = "" ]; then
SPARK_MASTER_IP="`hostname`"
fi
-if [ "$START_TACHYON" == "true" ]; then
- "${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon bootstrap-conf "$SPARK_MASTER_IP"
-
- # set -t so we can call sudo
- SPARK_SSH_OPTS="-o StrictHostKeyChecking=no -t" "${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/tachyon/bin/tachyon-start.sh" worker SudoMount \; sleep 1
-fi
-
# Launch the slaves
"${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-slave.sh" "spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT"
diff --git a/sbin/stop-master.sh b/sbin/stop-master.sh
index e57962bb35..14644ea72d 100755
--- a/sbin/stop-master.sh
+++ b/sbin/stop-master.sh
@@ -26,7 +26,3 @@ fi
. "${SPARK_HOME}/sbin/spark-config.sh"
"${SPARK_HOME}/sbin"/spark-daemon.sh stop org.apache.spark.deploy.master.Master 1
-
-if [ -e "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon ]; then
- "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon killAll tachyon.master.Master
-fi
diff --git a/sbin/stop-slaves.sh b/sbin/stop-slaves.sh
index 6395637762..a57441b52a 100755
--- a/sbin/stop-slaves.sh
+++ b/sbin/stop-slaves.sh
@@ -25,9 +25,4 @@ fi
. "${SPARK_HOME}/bin/load-spark-env.sh"
-# do before the below calls as they exec
-if [ -e "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon ]; then
- "${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon killAll tachyon.worker.Worker
-fi
-
"${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/stop-slave.sh