From 5641130a8ed2a2917e64298d6832ec3b201e49a0 Mon Sep 17 00:00:00 2001 From: Patrick Wendell Date: Thu, 19 Dec 2013 23:20:24 +0000 Subject: Updating docs to point to 0.8.1. --- site/docs/0.8.1/running-on-mesos.html | 238 ++++++++++++++++++++++++++++++++++ 1 file changed, 238 insertions(+) create mode 100644 site/docs/0.8.1/running-on-mesos.html (limited to 'site/docs/0.8.1/running-on-mesos.html') diff --git a/site/docs/0.8.1/running-on-mesos.html b/site/docs/0.8.1/running-on-mesos.html new file mode 100644 index 000000000..a29317526 --- /dev/null +++ b/site/docs/0.8.1/running-on-mesos.html @@ -0,0 +1,238 @@ + + + + + + + + + Running Spark on Mesos - Spark 0.8.1 Documentation + + + + + + + + + + + + + + + + + + + + + + + +
+

Running Spark on Mesos

+ +

Spark can run on clusters managed by Apache Mesos. Follow the steps below to install Mesos and Spark:

+ +
    +
  1. Download and build Spark using the instructions here. Note: Don’t forget to consider what version of HDFS you might want to use!
  2. +
  3. Download, build, install, and start Mesos 0.13.0 on your cluster. You can download the Mesos distribution from a mirror. See the Mesos Getting Started page for more information. Note: If you want to run Mesos without installing it into the default paths on your system (e.g., if you don’t have administrative privileges to install it), you should also pass the --prefix option to configure to tell it where to install. For example, pass --prefix=/home/user/mesos. By default the prefix is /usr/local.
  4. +
  5. Create a Spark “distribution” using make-distribution.sh.
  6. +
  7. Rename the dist directory created from make-distribution.sh to spark-0.8.1-incubating.
  8. +
  9. Create a tar archive: tar czf spark-0.8.1-incubating.tar.gz spark-0.8.1-incubating
  10. +
  11. Upload this archive to HDFS or another place accessible from Mesos via http://, e.g., Amazon Simple Storage Service: hadoop fs -put spark-0.8.1-incubating.tar.gz /path/to/spark-0.8.1-incubating.tar.gz
  12. +
  13. Create a file called spark-env.sh in Spark’s conf directory, by copying conf/spark-env.sh.template, and add the following lines to it: +
      +
    • export MESOS_NATIVE_LIBRARY=<path to libmesos.so>. This path is usually <prefix>/lib/libmesos.so (where the prefix is /usr/local by default, see above). Also, on Mac OS X, the library is called libmesos.dylib instead of libmesos.so.
    • +
    • export SPARK_EXECUTOR_URI=<path to spark-0.8.1-incubating.tar.gz uploaded above>.
    • +
    • export MASTER=mesos://HOST:PORT where HOST:PORT is the host and port (default: 5050) of your Mesos master (or zk://... if using Mesos with ZooKeeper).
    • +
    +
  14. +
  15. To run a Spark application against the cluster, when you create your SparkContext, pass the string mesos://HOST:PORT as the first parameter. In addition, you’ll need to set the spark.executor.uri property. For example:
  16. +
+ +
System.setProperty("spark.executor.uri", "<path to spark-0.8.1-incubating.tar.gz uploaded above>")
+val sc = new SparkContext("mesos://HOST:5050", "App Name", ...)
+
+ +

If you want to run Spark on Amazon EC2, you can use the Spark EC2 launch scripts, which provide an easy way to launch a cluster with Mesos, Spark, and HDFS pre-configured. This will get you a cluster in about five minutes without any configuration on your part.

+ +

Mesos Run Modes

+ +

Spark can run over Mesos in two modes: “fine-grained” and “coarse-grained”. In fine-grained mode, which is the default, +each Spark task runs as a separate Mesos task. This allows multiple instances of Spark (and other frameworks) to share +machines at a very fine granularity, where each application gets more or fewer machines as it ramps up, but it comes with an +additional overhead in launching each task, which may be inappropriate for low-latency applications (e.g. interactive queries or serving web requests). The coarse-grained mode will instead +launch only one long-running Spark task on each Mesos machine, and dynamically schedule its own “mini-tasks” within +it. The benefit is much lower startup overhead, but at the cost of reserving the Mesos resources for the complete duration +of the application.

+ +

To run in coarse-grained mode, set the spark.mesos.coarse system property to true before creating your SparkContext:

+ +
System.setProperty("spark.mesos.coarse", "true")
+val sc = new SparkContext("mesos://HOST:5050", "App Name", ...)
+
+ +

In addition, for coarse-grained mode, you can control the maximum number of resources Spark will acquire. By default, +it will acquire all cores in the cluster (that get offered by Mesos), which only makes sense if you run just one +application at a time. You can cap the maximum number of cores using System.setProperty("spark.cores.max", "10") (for example). +Again, this must be done before initializing a SparkContext.

+ +

Running Alongside Hadoop

+ +

You can run Spark and Mesos alongside your existing Hadoop cluster by just launching them as a separate service on the machines. To access Hadoop data from Spark, just use a hdfs:// URL (typically hdfs://<namenode>:9000/path, but you can find the right URL on your Hadoop Namenode’s web UI).

+ +

In addition, it is possible to also run Hadoop MapReduce on Mesos, to get better resource isolation and sharing between the two. In this case, Mesos will act as a unified scheduler that assigns cores to either Hadoop or Spark, as opposed to having them share resources via the Linux scheduler on each node. Please refer to Hadoop on Mesos.

+ +

In either case, HDFS runs separately from Hadoop MapReduce, without going through Mesos.

+ + + + + + + +
+
+

+ Apache Spark is an effort undergoing incubation at the Apache Software Foundation. + + + +

+
+ +
+ + + + + + + + + + -- cgit v1.2.3