diff options
author | Matei Zaharia <matei@eecs.berkeley.edu> | 2013-08-30 15:04:43 -0700 |
---|---|---|
committer | Matei Zaharia <matei@eecs.berkeley.edu> | 2013-08-30 15:04:43 -0700 |
commit | 4293533032bd5c354bb011f8d508b99615c6e0f0 (patch) | |
tree | e82fd2cc72c90ed98f5b0f1f4a74593cf3e6c54b /docs/quick-start.md | |
parent | f3a964848dd2ba65491f3eea8a54439069aa1b29 (diff) | |
download | spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.gz spark-4293533032bd5c354bb011f8d508b99615c6e0f0.tar.bz2 spark-4293533032bd5c354bb011f8d508b99615c6e0f0.zip |
Update docs about HDFS versions
Diffstat (limited to 'docs/quick-start.md')
-rw-r--r-- | docs/quick-start.md | 20 |
1 files changed, 19 insertions, 1 deletions
diff --git a/docs/quick-start.md b/docs/quick-start.md index 4e9deadbaa..bac5d690a6 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -142,7 +142,13 @@ resolvers ++= Seq( "Spray Repository" at "http://repo.spray.cc/") {% endhighlight %} -Of course, for sbt to work correctly, we'll need to layout `SimpleJob.scala` and `simple.sbt` according to the typical directory structure. Once that is in place, we can create a JAR package containing the job's code, then use `sbt run` to execute our example job. +If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS: + +{% highlight scala %} +libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "<your-hdfs-version>" +{% endhighlight %} + +Finally, for sbt to work correctly, we'll need to layout `SimpleJob.scala` and `simple.sbt` according to the typical directory structure. Once that is in place, we can create a JAR package containing the job's code, then use `sbt run` to execute our example job. {% highlight bash %} $ find . @@ -223,6 +229,16 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep </project> {% endhighlight %} +If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS: + +{% highlight xml %} + <dependency> + <groupId>org.apache.hadoop</groupId> + <artifactId>hadoop-client</artifactId> + <version>...</version> + </dependency> +{% endhighlight %} + We lay out these files according to the canonical Maven directory structure: {% highlight bash %} $ find . @@ -281,3 +297,5 @@ Lines with a: 46, Lines with b: 23 {% endhighlight python %} This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS. + +Also, this example links against the default version of HDFS that Spark builds with (1.0.4). You can run it against other HDFS versions by [building Spark with another HDFS version](index.html#a-note-about-hadoop-versions). |