Update docs about HDFS versions

author: Matei Zaharia <matei@eecs.berkeley.edu> 2013-08-30 15:04:43 -0700
committer: Matei Zaharia <matei@eecs.berkeley.edu> 2013-08-30 15:04:43 -0700
commit: 4293533032bd5c354bb011f8d508b99615c6e0f0 (patch)
tree: e82fd2cc72c90ed98f5b0f1f4a74593cf3e6c54b /docs/quick-start.md
parent: f3a964848dd2ba65491f3eea8a54439069aa1b29 (diff)
download: spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.gz
spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.bz2
spark-4293533032bd5c354bb011f8d508b99615c6e0f0.zip
1 files changed, 19 insertions, 1 deletions
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 4e9deadbaa..bac5d690a6 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -142,7 +142,13 @@ resolvers ++= Seq(
   "Spray Repository" at "http://repo.spray.cc/")
 {% endhighlight %}
 
-Of course, for sbt to work correctly, we'll need to layout `SimpleJob.scala` and `simple.sbt` according to the typical directory structure. Once that is in place, we can create a JAR package containing the job's code, then use `sbt run` to execute our example job.
+If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
+
+{% highlight scala %}
+libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "<your-hdfs-version>"
+{% endhighlight %}
+
+Finally, for sbt to work correctly, we'll need to layout `SimpleJob.scala` and `simple.sbt` according to the typical directory structure. Once that is in place, we can create a JAR package containing the job's code, then use `sbt run` to execute our example job.
 
 {% highlight bash %}
 $ find .
@@ -223,6 +229,16 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
 </project>
 {% endhighlight %}
 
+If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
+
+{% highlight xml %}
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-client</artifactId>
+      <version>...</version>
+    </dependency>
+{% endhighlight %}
+
 We lay out these files according to the canonical Maven directory structure:
 {% highlight bash %}
 $ find .
@@ -281,3 +297,5 @@ Lines with a: 46, Lines with b: 23
 {% endhighlight python %}
 
 This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS.
+
+Also, this example links against the default version of HDFS that Spark builds with (1.0.4). You can run it against other HDFS versions by [building Spark with another HDFS version](index.html#a-note-about-hadoop-versions).
author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-08-30 15:04:43 -0700
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-08-30 15:04:43 -0700
commit	4293533032bd5c354bb011f8d508b99615c6e0f0 (patch)
tree	e82fd2cc72c90ed98f5b0f1f4a74593cf3e6c54b /docs/quick-start.md
parent	f3a964848dd2ba65491f3eea8a54439069aa1b29 (diff)
download	spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.gz spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.bz2 spark-4293533032bd5c354bb011f8d508b99615c6e0f0.zip