summaryrefslogtreecommitdiff
path: root/site/docs/0.7.3/building-with-maven.html
blob: 1dcbdfedad86d2763ed31be885b8833db38d2b8e (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
<!DOCTYPE html>
<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title>Building Spark with Maven - Spark 0.7.3 Documentation</title>
        <meta name="description" content="">

        <link rel="stylesheet" href="css/bootstrap.min.css">
        <style>
            body {
                padding-top: 60px;
                padding-bottom: 40px;
            }
        </style>
        <meta name="viewport" content="width=device-width">
        <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
        <link rel="stylesheet" href="css/main.css">

        <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
        
        <link rel="stylesheet" href="css/pygments-default.css">

        <!-- Google analytics script -->
        <script type="text/javascript">
          var _gaq = _gaq || [];
          _gaq.push(['_setAccount', 'UA-32518208-1']);
          _gaq.push(['_trackPageview']);

          (function() {
            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
          })();
        </script>

    </head>
    <body>
        <!--[if lt IE 7]>
            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
        <![endif]-->

        <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->

        <div class="navbar navbar-fixed-top" id="topbar">
            <div class="navbar-inner">
                <div class="container">
                    <div class="brand"><a href="index.html">
                      <img src="img/spark-logo-77x50px-hd.png" /></a><span class="version">0.7.3</span>
                    </div>
                    <ul class="nav">
                        <!--TODO(andyk): Add class="active" attribute to li some how.-->
                        <li><a href="index.html">Overview</a></li>

                        <li class="dropdown">
                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
                            <ul class="dropdown-menu">
                                <li><a href="quick-start.html">Quick Start</a></li>
                                <li><a href="scala-programming-guide.html">Scala</a></li>
                                <li><a href="java-programming-guide.html">Java</a></li>
                                <li><a href="python-programming-guide.html">Python</a></li>
                                <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
                            </ul>
                        </li>
                        
                        <li class="dropdown">
                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
                            <ul class="dropdown-menu">
                                <li><a href="api/core/index.html">Spark Java/Scala (Scaladoc)</a></li>
                                <li><a href="api/pyspark/index.html">Spark Python (Epydoc)</a></li>
                                <li><a href="api/streaming/index.html">Spark Streaming Java/Scala (Scaladoc) </a></li>
                            </ul>
                        </li>

                        <li class="dropdown">
                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
                            <ul class="dropdown-menu">
                                <li><a href="ec2-scripts.html">Amazon EC2</a></li>
                                <li><a href="spark-standalone.html">Standalone Mode</a></li>
                                <li><a href="running-on-mesos.html">Mesos</a></li>
                                <li><a href="running-on-yarn.html">YARN</a></li>
                            </ul>
                        </li>

                        <li class="dropdown">
                            <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
                            <ul class="dropdown-menu">
                                <li><a href="building-with-maven.html">Building Spark with Maven</a></li>
                                <li><a href="configuration.html">Configuration</a></li>
                                <li><a href="tuning.html">Tuning Guide</a></li>
                                <li><a href="bagel-programming-guide.html">Bagel (Pregel on Spark)</a></li>
                                <li><a href="contributing-to-spark.html">Contributing to Spark</a></li>
                            </ul>
                        </li>
                    </ul>
                    <!--<p class="navbar-text pull-right"><span class="version-text">v0.7.3</span></p>-->
                </div>
            </div>
        </div>

        <div class="container" id="content">
          <h1 class="title">Building Spark with Maven</h1>

          <ul id="markdown-toc">
  <li><a href="#spark-tests-in-maven">Spark Tests in Maven</a></li>
  <li><a href="#setting-up-jvm-memory-usage-via-maven">Setting up JVM Memory Usage Via Maven</a></li>
  <li><a href="#using-with-intellij-idea">Using With IntelliJ IDEA</a></li>
  <li><a href="#building-spark-debian-packages">Building Spark Debian Packages</a></li>
</ul>

<p>Building Spark using Maven Requires Maven 3 (the build process is tested with Maven 3.0.4) and Java 1.6 or newer.</p>

<p>Building with Maven requires that a Hadoop profile be specified explicitly at the command line, there is no default. There are two profiles to choose from, one for building for Hadoop 1 or Hadoop 2.</p>

<p>for Hadoop 1 (using 0.20.205.0) use:</p>

<pre><code>$ mvn -Phadoop1 clean install
</code></pre>

<p>for Hadoop 2 (using 2.0.0-mr1-cdh4.1.1) use:</p>

<pre><code>$ mvn -Phadoop2 clean install
</code></pre>

<p>It uses the scala-maven-plugin which supports incremental and continuous compilation. E.g.</p>

<pre><code>$ mvn -Phadoop2 scala:cc
</code></pre>

<p>…should run continuous compilation (i.e. wait for changes). However, this has not been tested extensively.</p>

<h2 id="spark-tests-in-maven">Spark Tests in Maven</h2>

<p>Tests are run by default via the scalatest-maven-plugin. With this you can do things like:</p>

<p>Skip test execution (but not compilation):</p>

<pre><code>$ mvn -DskipTests -Phadoop2 clean install
</code></pre>

<p>To run a specific test suite:</p>

<pre><code>$ mvn -Phadoop2 -Dsuites=spark.repl.ReplSuite test
</code></pre>

<h2 id="setting-up-jvm-memory-usage-via-maven">Setting up JVM Memory Usage Via Maven</h2>

<p>You might run into the following errors if you&#8217;re using a vanilla installation of Maven:</p>

<pre><code>[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.9.3/classes...
[ERROR] PermGen space -&gt; [Help 1]

[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.9.3/classes...
[ERROR] Java heap space -&gt; [Help 1]
</code></pre>

<p>To fix these, you can do the following:</p>

<pre><code>export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=128M"
</code></pre>

<h2 id="using-with-intellij-idea">Using With IntelliJ IDEA</h2>

<p>This setup works fine in IntelliJ IDEA 11.1.4. After opening the project via the pom.xml file in the project root folder, you only need to activate either the hadoop1 or hadoop2 profile in the &#8220;Maven Properties&#8221; popout. We have not tried Eclipse/Scala IDE with this.</p>

<h2 id="building-spark-debian-packages">Building Spark Debian Packages</h2>

<p>It includes support for building a Debian package containing a &#8216;fat-jar&#8217; which includes the repl, the examples and bagel. This can be created by specifying the deb profile:</p>

<pre><code>$ mvn -Phadoop2,deb clean install
</code></pre>

<p>The debian package can then be found under repl/target. We added the short commit hash to the file name so that we can distinguish individual packages build for SNAPSHOT versions.</p>

            <!-- Main hero unit for a primary marketing message or call to action -->
            <!--<div class="hero-unit">
                <h1>Hello, world!</h1>
                <p>This is a template for a simple marketing or informational website. It includes a large callout called the hero unit and three supporting pieces of content. Use it as a starting point to create something more unique.</p>
                <p><a class="btn btn-primary btn-large">Learn more &raquo;</a></p>
            </div>-->

            <!-- Example row of columns -->
            <!--<div class="row">
                <div class="span4">
                    <h2>Heading</h2>
                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
                    <p><a class="btn" href="#">View details &raquo;</a></p>
                </div>
                <div class="span4">
                    <h2>Heading</h2>
                    <p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p>
                    <p><a class="btn" href="#">View details &raquo;</a></p>
               </div>
                <div class="span4">
                    <h2>Heading</h2>
                    <p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p>
                    <p><a class="btn" href="#">View details &raquo;</a></p>
                </div>
            </div>

            <hr>-->

            <!--<footer>
                <p></p>
            </footer>-->

        </div> <!-- /container -->

        <script src="js/vendor/jquery-1.8.0.min.js"></script>
        <script src="js/vendor/bootstrap.min.js"></script>
        <script src="js/main.js"></script>
        
        <!-- A script to fix internal hash links because we have an overlapping top bar.
             Based on https://github.com/twitter/bootstrap/issues/193#issuecomment-2281510 -->
        <script>
          $(function() {
            function maybeScrollToHash() {
              if (window.location.hash && $(window.location.hash).length) {
                var newTop = $(window.location.hash).offset().top - $('#topbar').height() - 5;
                $(window).scrollTop(newTop);
              }
            }
            $(window).bind('hashchange', function() {
              maybeScrollToHash();
            });
            // Scroll now too in case we had opened the page on a hash, but wait 1 ms because some browsers
            // will try to do *their* initial scroll after running the onReady handler.
            setTimeout(function() { maybeScrollToHash(); }, 1)
          })
        </script>

    </body>
</html>