aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rwxr-xr-xconf/spark-env.sh.template2
-rw-r--r--docs/hardware-provisioning.md1
-rw-r--r--docs/index.md9
-rw-r--r--docs/quick-start.md10
4 files changed, 8 insertions, 14 deletions
diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template
index a367d59d64..d92d2e2ae3 100755
--- a/conf/spark-env.sh.template
+++ b/conf/spark-env.sh.template
@@ -4,7 +4,7 @@
# spark-env.sh and edit that to configure Spark for your site.
#
# The following variables can be set in this file:
-# - SPARK_LOCAL_IP, to override the IP address binds to
+# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos
# - SPARK_JAVA_OPTS, to set node-specific JVM options for Spark. Note that
# we recommend setting app-wide options in the application's driver program.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index d21e2a3d70..e5f054cb14 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -21,7 +21,6 @@ Hadoop and Spark on a common cluster manager like [Mesos](running-on-mesos.html)
[Hadoop YARN](running-on-yarn.html).
* If this is not possible, run Spark on different nodes in the same local-area network as HDFS.
-If your cluster spans multiple racks, include some Spark nodes on each rack.
* For low-latency data stores like HBase, it may be preferrable to run computing jobs on different
nodes than the storage system to avoid interference.
diff --git a/docs/index.md b/docs/index.md
index bcd7dad6ae..0ea0e103e4 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -40,12 +40,13 @@ Python interpreter (`./pyspark`). These are a great way to learn Spark.
Spark uses the Hadoop-client library to talk to HDFS and other Hadoop-supported
storage systems. Because the HDFS protocol has changed in different versions of
Hadoop, you must build Spark against the same version that your cluster uses.
-You can do this by setting the `SPARK_HADOOP_VERSION` variable when compiling:
+By default, Spark links to Hadoop 1.0.4. You can change this by setting the
+`SPARK_HADOOP_VERSION` variable when compiling:
SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
-In addition, if you wish to run Spark on [YARN](running-on-yarn.md), you should also
-set `SPARK_YARN`:
+In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
+`SPARK_YARN` to `true`:
SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
@@ -94,7 +95,7 @@ set `SPARK_YARN`:
exercises about Spark, Shark, Mesos, and more. [Videos](http://ampcamp.berkeley.edu/agenda-2012),
[slides](http://ampcamp.berkeley.edu/agenda-2012) and [exercises](http://ampcamp.berkeley.edu/exercises-2012) are
available online for free.
-* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
+* [Code Examples](http://spark.incubator.apache.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/) of Spark
* [Paper Describing Spark](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
* [Paper Describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)
diff --git a/docs/quick-start.md b/docs/quick-start.md
index bac5d690a6..11d4370a1d 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -126,7 +126,7 @@ object SimpleJob {
This job simply counts the number of lines containing 'a' and the number containing 'b' in the Spark README. Note that you'll need to replace $YOUR_SPARK_HOME with the location where Spark is installed. Unlike the earlier examples with the Spark shell, which initializes its own SparkContext, we initialize a SparkContext as part of the job. We pass the SparkContext constructor four arguments, the type of scheduler we want to use (in this case, a local scheduler), a name for the job, the directory where Spark is installed, and a name for the jar file containing the job's sources. The final two arguments are needed in a distributed setting, where Spark is running across several nodes, so we include them for completeness. Spark will automatically ship the jar files you list to slave nodes.
-This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds two repositories which host Spark dependencies:
+This file depends on the Spark API, so we'll also include an sbt configuration file, `simple.sbt` which explains that Spark is a dependency. This file also adds a repository that Spark depends on:
{% highlight scala %}
name := "Simple Project"
@@ -137,9 +137,7 @@ scalaVersion := "{{site.SCALA_VERSION}}"
libraryDependencies += "org.spark-project" %% "spark-core" % "{{site.SPARK_VERSION}}"
-resolvers ++= Seq(
- "Akka Repository" at "http://repo.akka.io/releases/",
- "Spray Repository" at "http://repo.spray.cc/")
+resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
{% endhighlight %}
If you also wish to read data from Hadoop's HDFS, you will also need to add a dependency on `hadoop-client` for your version of HDFS:
@@ -211,10 +209,6 @@ To build the job, we also write a Maven `pom.xml` file that lists Spark as a dep
<version>1.0</version>
<repositories>
<repository>
- <id>Spray.cc repository</id>
- <url>http://repo.spray.cc</url>
- </repository>
- <repository>
<id>Akka repository</id>
<url>http://repo.akka.io/releases</url>
</repository>