aboutsummaryrefslogtreecommitdiff
path: root/docs/quick-start.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@eecs.berkeley.edu>2013-08-30 15:04:43 -0700
committerMatei Zaharia <matei@eecs.berkeley.edu>2013-08-30 15:04:43 -0700
commit4293533032bd5c354bb011f8d508b99615c6e0f0 (patch)
treee82fd2cc72c90ed98f5b0f1f4a74593cf3e6c54b /docs/quick-start.md
parentf3a964848dd2ba65491f3eea8a54439069aa1b29 (diff)
downloadspark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.gz
spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.bz2
spark-4293533032bd5c354bb011f8d508b99615c6e0f0.zip
Update docs about HDFS versions
Diffstat (limited to 'docs/quick-start.md')
-rw-r--r--docs/quick-start.md20
1 files changed, 19 insertions, 1 deletions
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 4e9deadbaa..bac5d690a6 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -142,7 +142,13 @@ resolvers ++= Seq(
"Spray Repository" at "http://repo.spray.cc/")
{% endhighlight %}
-Of course, for sbt to work correctly, we'll need to layout `SimpleJob.scala` and `simple.sbt` according to the typical directory structure. Once that is in place, we can create a JAR package containing the job's code, then use `sbt run` to execute our example job.
+If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
+
+{% highlight scala %}
+libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "<your-hdfs-version>"
+{% endhighlight %}
+
+Finally, for sbt to work correctly, we'll need to layout `SimpleJob.scala` and `simple.sbt` according to the typical directory structure. Once that is in place, we can create a JAR package containing the job's code, then use `sbt run` to execute our example job.
{% highlight bash %}
$ find .
@@ -223,6 +229,16 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
</project>
{% endhighlight %}
+If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
+
+{% highlight xml %}
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>hadoop-client</artifactId>
+ <version>...</version>
+ </dependency>
+{% endhighlight %}
+
We lay out these files according to the canonical Maven directory structure:
{% highlight bash %}
$ find .
@@ -281,3 +297,5 @@ Lines with a: 46, Lines with b: 23
{% endhighlight python %}
This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS.
+
+Also, this example links against the default version of HDFS that Spark builds with (1.0.4). You can run it against other HDFS versions by [building Spark with another HDFS version](index.html#a-note-about-hadoop-versions).