path: root/site/docs/1.1.0/running-on-mesos.html
diff options
authorPatrick Wendell <>2014-09-11 05:00:26 +0000
committerPatrick Wendell <>2014-09-11 05:00:26 +0000
commit07461d1269cd6d373630c20fb50c2988af5c21f4 (patch)
tree04c4f9a3cbe613b3d3c79a8581e6b83babbc3e0b /site/docs/1.1.0/running-on-mesos.html
parent46d52fbb9be4b5b90a7a1ee9ce3e943156d190b9 (diff)
Adding Spark 1.1.0 docs.
Diffstat (limited to 'site/docs/1.1.0/running-on-mesos.html')
1 files changed, 379 insertions, 0 deletions
diff --git a/site/docs/1.1.0/running-on-mesos.html b/site/docs/1.1.0/running-on-mesos.html
new file mode 100644
index 000000000..8f8816289
--- /dev/null
+++ b/site/docs/1.1.0/running-on-mesos.html
@@ -0,0 +1,379 @@
+<!DOCTYPE html>
+<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
+<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
+<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+ <title>Running Spark on Mesos - Spark 1.1.0 Documentation</title>
+ <meta name="description" content="">
+ <link rel="stylesheet" href="css/bootstrap.min.css">
+ <style>
+ body {
+ padding-top: 60px;
+ padding-bottom: 40px;
+ }
+ </style>
+ <meta name="viewport" content="width=device-width">
+ <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
+ <link rel="stylesheet" href="css/main.css">
+ <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+ <link rel="stylesheet" href="css/pygments-default.css">
+ <!-- Google analytics script -->
+ <script type="text/javascript">
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-32518208-1']);
+ _gaq.push(['_trackPageview']);
+ (function() {
+ var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '';
+ var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+ </head>
+ <body>
+ <!--[if lt IE 7]>
+ <p class="chromeframe">You are using an outdated browser. <a href="">Upgrade your browser today</a> or <a href="">install Google Chrome Frame</a> to better experience this site.</p>
+ <![endif]-->
+ <!-- This code is taken from -->
+ <div class="navbar navbar-fixed-top" id="topbar">
+ <div class="navbar-inner">
+ <div class="container">
+ <div class="brand"><a href="index.html">
+ <img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">1.1.0</span>
+ </div>
+ <ul class="nav">
+ <!--TODO(andyk): Add class="active" attribute to li some how.-->
+ <li><a href="index.html">Overview</a></li>
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li><a href="quick-start.html">Quick Start</a></li>
+ <li><a href="programming-guide.html">Spark Programming Guide</a></li>
+ <li class="divider"></li>
+ <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
+ <li><a href="sql-programming-guide.html">Spark SQL</a></li>
+ <li><a href="mllib-guide.html">MLlib (Machine Learning)</a></li>
+ <li><a href="graphx-programming-guide.html">GraphX (Graph Processing)</a></li>
+ <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a></li>
+ </ul>
+ </li>
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li><a href="api/scala/index.html#org.apache.spark.package">Scaladoc</a></li>
+ <li><a href="api/java/index.html">Javadoc</a></li>
+ <li><a href="api/python/index.html">Python API</a></li>
+ </ul>
+ </li>
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li><a href="cluster-overview.html">Overview</a></li>
+ <li><a href="submitting-applications.html">Submitting Applications</a></li>
+ <li class="divider"></li>
+ <li><a href="ec2-scripts.html">Amazon EC2</a></li>
+ <li><a href="spark-standalone.html">Standalone Mode</a></li>
+ <li><a href="running-on-mesos.html">Mesos</a></li>
+ <li><a href="running-on-yarn.html">YARN</a></li>
+ </ul>
+ </li>
+ <li class="dropdown">
+ <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li><a href="configuration.html">Configuration</a></li>
+ <li><a href="monitoring.html">Monitoring</a></li>
+ <li><a href="tuning.html">Tuning Guide</a></li>
+ <li><a href="job-scheduling.html">Job Scheduling</a></li>
+ <li><a href="security.html">Security</a></li>
+ <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
+ <li><a href="hadoop-third-party-distributions.html">3<sup>rd</sup>-Party Hadoop Distros</a></li>
+ <li class="divider"></li>
+ <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
+ <li><a href="">Contributing to Spark</a></li>
+ </ul>
+ </li>
+ </ul>
+ <!--<p class="navbar-text pull-right"><span class="version-text">v1.1.0</span></p>-->
+ </div>
+ </div>
+ </div>
+ <div class="container" id="content">
+ <h1 class="title">Running Spark on Mesos</h1>
+ <p>Spark can run on hardware clusters managed by <a href="">Apache Mesos</a>.</p>
+<p>The advantages of deploying Spark with Mesos include:</p>
+ <li>dynamic partitioning between Spark and other
+<a href="">frameworks</a></li>
+ <li>scalable partitioning between multiple instances of Spark</li>
+<h1 id="how-it-works">How it Works</h1>
+<p>In a standalone cluster deployment, the cluster manager in the below diagram is a Spark master
+instance. When using Mesos, the Mesos master replaces the Spark master as the cluster manager.</p>
+<p style="text-align: center;">
+ <img src="img/cluster-overview.png" title="Spark cluster components" alt="Spark cluster components" />
+<p>Now when a driver creates a job and starts issuing tasks for scheduling, Mesos determines what
+machines handle what tasks. Because it takes into account other frameworks when scheduling these
+many short-lived tasks, multiple frameworks can coexist on the same cluster without resorting to a
+static partitioning of resources.</p>
+<p>To get started, follow the steps below to install Mesos and deploy Spark jobs via Mesos.</p>
+<h1 id="installing-mesos">Installing Mesos</h1>
+<p>Spark 1.1.0-SNAPSHOT is designed for use with Mesos 0.18.1 and does not
+require any special patches of Mesos.</p>
+<p>If you already have a Mesos cluster running, you can skip this Mesos installation step.</p>
+<p>Otherwise, installing Mesos for Spark is no different than installing Mesos for use by other
+frameworks. You can install Mesos either from source or using prebuilt packages.</p>
+<h2 id="from-source">From Source</h2>
+<p>To install Apache Mesos from source, follow these steps:</p>
+ <li>Download a Mesos release from a
+<a href="">mirror</a></li>
+ <li>Follow the Mesos <a href="">Getting Started</a> page for compiling and
+installing Mesos</li>
+<p><strong>Note:</strong> If you want to run Mesos without installing it into the default paths on your system
+(e.g., if you lack administrative privileges to install it), pass the
+<code>--prefix</code> option to <code>configure</code> to tell it where to install. For example, pass
+<code>--prefix=/home/me/mesos</code>. By default the prefix is <code>/usr/local</code>.</p>
+<h2 id="third-party-packages">Third-Party Packages</h2>
+<p>The Apache Mesos project only publishes source releases, not binary packages. But other
+third party projects publish binary releases that may be helpful in setting Mesos up.</p>
+<p>One of those is Mesosphere. To install Mesos using the binary releases provided by Mesosphere:</p>
+ <li>Download Mesos installation package from <a href="">downloads page</a></li>
+ <li>Follow their instructions for installation and configuration</li>
+<p>The Mesosphere installation documents suggest setting up ZooKeeper to handle Mesos master failover,
+but Mesos can be run without ZooKeeper using a single master as well.</p>
+<h2 id="verification">Verification</h2>
+<p>To verify that the Mesos cluster is ready for Spark, navigate to the Mesos master webui at port
+<code>:5050</code> Confirm that all expected machines are present in the slaves tab.</p>
+<h1 id="connecting-spark-to-mesos">Connecting Spark to Mesos</h1>
+<p>To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and
+a Spark driver program configured to connect to Mesos.</p>
+<h2 id="uploading-spark-package">Uploading Spark Package</h2>
+<p>When Mesos runs a task on a Mesos slave for the first time, that slave must have a Spark binary
+package for running the Spark Mesos executor backend.
+The Spark package can be hosted at any Hadoop-accessible URI, including HTTP via <code>http://</code>,
+<a href="">Amazon Simple Storage Service</a> via <code>s3n://</code>, or HDFS via <code>hdfs://</code>.</p>
+<p>To use a precompiled package:</p>
+ <li>Download a Spark binary package from the Spark <a href="">download page</a></li>
+ <li>Upload to hdfs/http/s3</li>
+<p>To host on HDFS, use the Hadoop fs put command: <code>hadoop fs -put spark-1.1.0-SNAPSHOT.tar.gz
+<p>Or if you are using a custom-compiled version of Spark, you will need to create a package using
+the <code></code> script included in a Spark source tarball/checkout.</p>
+ <li>Download and build Spark using the instructions <a href="index.html">here</a></li>
+ <li>Create a binary package using <code> --tgz</code>.</li>
+ <li>Upload archive to http/s3/hdfs</li>
+<h2 id="using-a-mesos-master-url">Using a Mesos Master URL</h2>
+<p>The Master URLs for Mesos are in the form <code>mesos://host:5050</code> for a single-master Mesos
+cluster, or <code>mesos://zk://host:2181</code> for a multi-master Mesos cluster using ZooKeeper.</p>
+<p>The driver also needs some configuration in <code></code> to interact properly with Mesos:</p>
+ <li>In <code></code> set some environment variables:
+ <ul>
+ <li><code>export MESOS_NATIVE_LIBRARY=&lt;path to;</code>. This path is typically
+<code>&lt;prefix&gt;/lib/</code> where the prefix is <code>/usr/local</code> by default. See Mesos installation
+instructions above. On Mac OS X, the library is called <code>libmesos.dylib</code> instead of
+ <li><code>export SPARK_EXECUTOR_URI=&lt;URL of spark-1.1.0-SNAPSHOT.tar.gz uploaded above&gt;</code>.</li>
+ </ul>
+ </li>
+ <li>Also set <code>spark.executor.uri</code> to <code>&lt;URL of spark-1.1.0-SNAPSHOT.tar.gz&gt;</code>.</li>
+<p>Now when starting a Spark application against the cluster, pass a <code>mesos://</code>
+URL as the master when creating a <code>SparkContext</code>. For example:</p>
+<div class="highlight"><pre><code class="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
+ <span class="o">.</span><span class="n">setMaster</span><span class="o">(</span><span class="s">&quot;mesos://HOST:5050&quot;</span><span class="o">)</span>
+ <span class="o">.</span><span class="n">setAppName</span><span class="o">(</span><span class="s">&quot;My app&quot;</span><span class="o">)</span>
+ <span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.executor.uri&quot;</span><span class="o">,</span> <span class="s">&quot;&lt;path to spark-1.1.0-SNAPSHOT.tar.gz uploaded above&gt;&quot;</span><span class="o">)</span>
+<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span>
+<p>(You can also use <a href="submitting-applications.html"><code>spark-submit</code></a> and configure <code>spark.executor.uri</code>
+in the <a href="configuration.html#loading-default-configurations">conf/spark-defaults.conf</a> file. Note
+that <code>spark-submit</code> currently only supports deploying the Spark driver in <code>client</code> mode for Mesos.)</p>
+<p>When running a shell, the <code>spark.executor.uri</code> parameter is inherited from <code>SPARK_EXECUTOR_URI</code>, so
+it does not need to be redundantly passed in as a system property.</p>
+<div class="highlight"><pre><code class="bash">./bin/spark-shell --master mesos://host:5050
+<h1 id="mesos-run-modes">Mesos Run Modes</h1>
+<p>Spark can run over Mesos in two modes: &#8220;fine-grained&#8221; (default) and &#8220;coarse-grained&#8221;.</p>
+<p>In &#8220;fine-grained&#8221; mode (default), each Spark task runs as a separate Mesos task. This allows
+multiple instances of Spark (and other frameworks) to share machines at a very fine granularity,
+where each application gets more or fewer machines as it ramps up and down, but it comes with an
+additional overhead in launching each task. This mode may be inappropriate for low-latency
+requirements like interactive queries or serving web requests.</p>
+<p>The &#8220;coarse-grained&#8221; mode will instead launch only <em>one</em> long-running Spark task on each Mesos
+machine, and dynamically schedule its own &#8220;mini-tasks&#8221; within it. The benefit is much lower startup
+overhead, but at the cost of reserving the Mesos resources for the complete duration of the
+<p>To run in coarse-grained mode, set the <code>spark.mesos.coarse</code> property in your
+<a href="configuration.html#spark-properties">SparkConf</a>:</p>
+<div class="highlight"><pre><code class="scala"><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.mesos.coarse&quot;</span><span class="o">,</span> <span class="s">&quot;true&quot;</span><span class="o">)</span>
+<p>In addition, for coarse-grained mode, you can control the maximum number of resources Spark will
+acquire. By default, it will acquire <em>all</em> cores in the cluster (that get offered by Mesos), which
+only makes sense if you run just one application at a time. You can cap the maximum number of cores
+using <code>conf.set("spark.cores.max", "10")</code> (for example).</p>
+<h1 id="running-alongside-hadoop">Running Alongside Hadoop</h1>
+<p>You can run Spark and Mesos alongside your existing Hadoop cluster by just launching them as a
+separate service on the machines. To access Hadoop data from Spark, a full <code>hdfs://</code> URL is required
+(typically <code>hdfs://&lt;namenode&gt;:9000/path</code>, but you can find the right URL on your Hadoop Namenode web
+<p>In addition, it is possible to also run Hadoop MapReduce on Mesos for better resource isolation and
+sharing between the two. In this case, Mesos will act as a unified scheduler that assigns cores to
+either Hadoop or Spark, as opposed to having them share resources via the Linux scheduler on each
+node. Please refer to <a href="">Hadoop on Mesos</a>.</p>
+<p>In either case, HDFS runs separately from Hadoop MapReduce, without being scheduled through Mesos.</p>
+<h1 id="troubleshooting-and-debugging">Troubleshooting and Debugging</h1>
+<p>A few places to look during debugging:</p>
+ <li>Mesos master on port <code>:5050</code>
+ <ul>
+ <li>Slaves should appear in the slaves tab</li>
+ <li>Spark applications should appear in the frameworks tab</li>
+ <li>Tasks should appear in the details of a framework</li>
+ <li>Check the stdout and stderr of the sandbox of failed tasks</li>
+ </ul>
+ </li>
+ <li>Mesos logs
+ <ul>
+ <li>Master and slave logs are both in <code>/var/log/mesos</code> by default</li>
+ </ul>
+ </li>
+<p>And common pitfalls:</p>
+ <li>Spark assembly not reachable/accessible
+ <ul>
+ <li>Slaves must be able to download the Spark binary package from the <code>http://</code>, <code>hdfs://</code> or <code>s3n://</code> URL you gave</li>
+ </ul>
+ </li>
+ <li>Firewall blocking communications
+ <ul>
+ <li>Check for messages about failed connections</li>
+ <li>Temporarily disable firewalls for debugging and then poke appropriate holes</li>
+ </ul>
+ </li>
+ </div> <!-- /container -->
+ <script src="js/vendor/jquery-1.8.0.min.js"></script>
+ <script src="js/vendor/bootstrap.min.js"></script>
+ <script src="js/main.js"></script>
+ <!-- MathJax Section -->
+ <script type="text/x-mathjax-config">
+ MathJax.Hub.Config({
+ TeX: { equationNumbers: { autoNumber: "AMS" } }
+ });
+ </script>
+ <script>
+ // Note that we load MathJax this way to work with local file (file://), HTTP and HTTPS.
+ // We could use "//cdn.mathjax...", but that won't support "file://".
+ (function(d, script) {
+ script = d.createElement('script');
+ script.type = 'text/javascript';
+ script.async = true;
+ script.onload = function(){
+ MathJax.Hub.Config({
+ tex2jax: {
+ inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ],
+ displayMath: [ ["$$","$$"], ["\\[", "\\]"] ],
+ processEscapes: true,
+ skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+ }
+ });
+ };
+ script.src = ('https:' == document.location.protocol ? 'https://' : 'http://') +
+ '';
+ d.getElementsByTagName('head')[0].appendChild(script);
+ }(document));
+ </script>
+ </body>