aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAndy Konwinski <andyk@berkeley.edu>2012-09-12 16:05:19 -0700
committerAndy Konwinski <andyk@berkeley.edu>2012-09-12 16:06:18 -0700
commit4d3a17c8d768a4e76bfb895ce53715434447cb62 (patch)
tree35d92aab36165b3ec68209622c260ebb9e3e9147
parent49e98500a9b1f93ab3224c4358dbc56f1e37ff35 (diff)
downloadspark-4d3a17c8d768a4e76bfb895ce53715434447cb62.tar.gz
spark-4d3a17c8d768a4e76bfb895ce53715434447cb62.tar.bz2
spark-4d3a17c8d768a4e76bfb895ce53715434447cb62.zip
Fixing lots of broken links.
-rw-r--r--docs/bagel-programming-guide.md2
-rw-r--r--docs/configuration.md4
-rw-r--r--docs/contributing-to-spark.md23
-rw-r--r--docs/ec2-scripts.md14
-rw-r--r--docs/index.md16
-rw-r--r--docs/programming-guide.md9
-rw-r--r--docs/running-on-amazon-ec2.md2
-rw-r--r--docs/running-on-mesos.md14
8 files changed, 38 insertions, 46 deletions
diff --git a/docs/bagel-programming-guide.md b/docs/bagel-programming-guide.md
index d4d08f8cb1..23f69a3ded 100644
--- a/docs/bagel-programming-guide.md
+++ b/docs/bagel-programming-guide.md
@@ -20,7 +20,7 @@ To write a Bagel application, you will need to add Spark, its dependencies, and
## Programming Model
-Bagel operates on a graph represented as a [[distributed dataset|Spark Programming Guide]] of (K, V) pairs, where keys are vertex IDs and values are vertices plus their associated state. In each superstep, Bagel runs a user-specified compute function on each vertex that takes as input the current vertex state and a list of messages sent to that vertex during the previous superstep, and returns the new vertex state and a list of outgoing messages.
+Bagel operates on a graph represented as a [distributed dataset]({{HOME_PATH}}programming-guide.html) of (K, V) pairs, where keys are vertex IDs and values are vertices plus their associated state. In each superstep, Bagel runs a user-specified compute function on each vertex that takes as input the current vertex state and a list of messages sent to that vertex during the previous superstep, and returns the new vertex state and a list of outgoing messages.
For example, we can use Bagel to implement PageRank. Here, vertices represent pages, edges represent links between pages, and messages represent shares of PageRank sent to the pages that a particular page links to.
diff --git a/docs/configuration.md b/docs/configuration.md
index 07190b2931..ab854de386 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -9,7 +9,7 @@ Spark is configured primarily through the `conf/spark-env.sh` script. This scrip
Inside this script, you can set several environment variables:
* `SCALA_HOME` to point to your Scala installation.
-* `MESOS_NATIVE_LIBRARY` if you are [[running on a Mesos cluster|Running Spark on Mesos]].
+* `MESOS_NATIVE_LIBRARY` if you are [running on a Mesos cluster]({{HOME_PATH}}running-on-mesos.html).
* `SPARK_MEM` to set the amount of memory used per node (this should be in the same format as the JVM's -Xmx option, e.g. `300m` or `1g`)
* `SPARK_JAVA_OPTS` to add JVM options. This includes system properties that you'd like to pass with `-D`.
* `SPARK_CLASSPATH` to add elements to Spark's classpath.
@@ -21,4 +21,4 @@ The most important thing to set first will probably be the memory (`SPARK_MEM`).
## Logging Configuration
-Spark uses [[log4j|http://logging.apache.org/log4j/]] for logging. You can configure it by adding a `log4j.properties` file in the `conf` directory. One way to start is to copy the existing `log4j.properties.template` located there.
+Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a `log4j.properties` file in the `conf` directory. One way to start is to copy the existing `log4j.properties.template` located there.
diff --git a/docs/contributing-to-spark.md b/docs/contributing-to-spark.md
index fc7544887b..3585bda2d3 100644
--- a/docs/contributing-to-spark.md
+++ b/docs/contributing-to-spark.md
@@ -4,23 +4,14 @@ title: How to Contribute to Spark
---
# Contributing to Spark
-The Spark team welcomes contributions in the form of GitHub pull requests.
-Here are a few tips to get your contribution in:
+The Spark team welcomes contributions in the form of GitHub pull requests. Here are a few tips to get your contribution in:
-- Break your work into small, single-purpose patches if possible. It's much harder to merge
- in a large change with a lot of disjoint features.
-- Submit the patch as a GitHub pull request. For a tutorial, see
- the GitHub guides on [[forking a repo|https://help.github.com/articles/fork-a-repo]]
- and [[sending a pull request|https://help.github.com/articles/using-pull-requests]].
-- Follow the style of the existing codebase. Specifically, we use [[standard Scala
- style guide|http://docs.scala-lang.org/style/]], but with the following changes:
+- Break your work into small, single-purpose patches if possible. It's much harder to merge in a large change with a lot of disjoint features.
+- Submit the patch as a GitHub pull request. For a tutorial, see the GitHub guides on [forking a repo](https://help.github.com/articles/fork-a-repo) and [sending a pull request](https://help.github.com/articles/using-pull-requests).
+- Follow the style of the existing codebase. Specifically, we use [standard Scala style guide](http://docs.scala-lang.org/style/), but with the following changes:
* Maximum line length of 100 characters.
* Always import packages using absolute paths (e.g. `scala.collection.Map` instead of `collection.Map`).
- * No "infix" syntax for methods other than operators. For example, don't write
- `table containsKey myKey`; replace it with `table.containsKey(myKey)`.
-- Add unit tests to your new code. We use [[ScalaTest|http://www.scalatest.org/]] for
- testing. Just add a new Suite in `core/src/test`, or methods to an existing Suite.
+ * No "infix" syntax for methods other than operators. For example, don't write `table containsKey myKey`; replace it with `table.containsKey(myKey)`.
+- Add unit tests to your new code. We use [ScalaTest](http://www.scalatest.org/) for testing. Just add a new Suite in `core/src/test`, or methods to an existing Suite.
-If you'd like to report a bug but don't have time to fix it, you can still post it to
-our [[issues page|https://github.com/mesos/spark/issues]]. Also, feel free to email
-the [[mailing list|http://www.spark-project.org/mailing-lists.html]].
+If you'd like to report a bug but don't have time to fix it, you can still post it to our [issues page](https://github.com/mesos/spark/issues). Also, feel free to email the [mailing list](http://www.spark-project.org/mailing-lists.html).
diff --git a/docs/ec2-scripts.md b/docs/ec2-scripts.md
index 35d28c47d0..6e058ac19b 100644
--- a/docs/ec2-scripts.md
+++ b/docs/ec2-scripts.md
@@ -122,11 +122,11 @@ root partitions and their `persistent-hdfs`. Stopped machines will not
cost you any EC2 cycles, but ***will*** continue to cost money for EBS
storage.
-- To stop one of your clusters, go into the `ec2` directory and run
+- To stop one of your clusters, go into the `ec2` directory and run
`./spark-ec2 stop <cluster-name>`.
-- To restart it later, run
+- To restart it later, run
`./spark-ec2 -i <key-file> start <cluster-name>`.
-- To ultimately destroy the cluster and stop consuming EBS space, run
+- To ultimately destroy the cluster and stop consuming EBS space, run
`./spark-ec2 destroy <cluster-name>` as described in the previous
section.
@@ -137,10 +137,10 @@ Limitations
It should not be hard to make it launch VMs in other zones, but you will need
to create your own AMIs in them.
- Support for "cluster compute" nodes is limited -- there's no way to specify a
- locality group. However, you can launch slave nodes in your `<clusterName>-slaves`
- group manually and then use `spark-ec2 launch --resume` to start a cluster with
- them.
+ locality group. However, you can launch slave nodes in your
+ `<clusterName>-slaves` group manually and then use `spark-ec2 launch
+ --resume` to start a cluster with them.
- Support for spot instances is limited.
If you have a patch or suggestion for one of these limitations, feel free to
-[[contribute|Contributing to Spark]] it!
+[
diff --git a/docs/index.md b/docs/index.md
index a1fe3b2e56..ac22363d3f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -14,7 +14,7 @@ Get Spark by checking out the master branch of the Git repository, using `git cl
Spark requires [Scala 2.9](http://www.scala-lang.org/).
In addition, to run Spark on a cluster, you will need to install [Mesos](http://incubator.apache.org/mesos/), using the steps in
-[[Running Spark on Mesos]]. However, if you just want to run Spark on a single machine (possibly using multiple cores),
+[Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html). However, if you just want to run Spark on a single machine (possibly using multiple cores),
you do not need Mesos.
To build and run Spark, you will need to have Scala's `bin` directory in your `PATH`,
@@ -51,12 +51,12 @@ of `project/SparkBuild.scala`, then rebuilding Spark (`sbt/sbt clean compile`).
# Where to Go from Here
-* [Spark Programming Guide](/programming-guide.html): how to get started using Spark, and details on the API
-* [Running Spark on Amazon EC2](/running-on-amazon-ec2.html): scripts that let you launch a cluster on EC2 in about 5 minutes
-* [Running Spark on Mesos](/running-on-mesos.html): instructions on how to deploy to a private cluster
-* [Configuration](/configuration.html)
-* [Bagel Programming Guide](/bagel-programming-guide.html): implementation of Google's Pregel on Spark
-* [Spark Debugger](/spark-debugger.html): experimental work on a debugger for Spark jobs
+* [Spark Programming Guide]({{HOME_PATH}}programming-guide.html): how to get started using Spark, and details on the API
+* [Running Spark on Amazon EC2]({{HOME_PATH}}running-on-amazon-ec2.html): scripts that let you launch a cluster on EC2 in about 5 minutes
+* [Running Spark on Mesos]({{HOME_PATH}}running-on-mesos.html): instructions on how to deploy to a private cluster
+* [Configuration]({{HOME_PATH}}configuration.html)
+* [Bagel Programming Guide]({{HOME_PATH}}bagel-programming-guide.html): implementation of Google's Pregel on Spark
+* [Spark Debugger]({{HOME_PATH}}spark-debugger.html): experimental work on a debugger for Spark jobs
* [Contributing to Spark](contributing-to-spark.html)
# Other Resources
@@ -72,4 +72,4 @@ To keep up with Spark development or get help, sign up for the [spark-users mail
If you're in the San Francisco Bay Area, there's a regular [Spark meetup](http://www.meetup.com/spark-users/) every few weeks. Come by to meet the developers and other users.
-If you'd like to contribute code to Spark, read [how to contribute](Contributing to Spark).
+If you'd like to contribute code to Spark, read [how to contribute]({{HOME_PATH}}contributing-to-spark.html).
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 8106e5bee6..15351bf661 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -24,7 +24,7 @@ This is done through the following constructor:
new SparkContext(master, jobName, [sparkHome], [jars])
-The `master` parameter is a string specifying a [Mesos](Running Spark on Mesos) cluster to connect to, or a special "local" string to run in local mode, as described below. `jobName` is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running on Mesos, as described later.
+The `master` parameter is a string specifying a [Mesos]({{HOME_PATH}}running-on-mesos.html) cluster to connect to, or a special "local" string to run in local mode, as described below. `jobName` is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running on Mesos, as described later.
In the Spark interpreter, a special interpreter-aware SparkContext is already created for you, in the variable called `sc`. Making your own SparkContext will not work. You can set which master the context connects to using the `MASTER` environment variable. For example, run `MASTER=local[4] ./spark-shell` to run locally with four cores.
@@ -36,7 +36,7 @@ The master name can be in one of three formats:
<tr><th>Master Name</th><th>Meaning</th></tr>
<tr><td> local </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr>
<tr><td> local[K] </td><td> Run Spark locally with K worker threads (which should be set to the number of cores on your machine). </td></tr>
-<tr><td> HOST:PORT </td><td> Connect Spark to the given <a href="https://github.com/mesos/spark/wiki/Running-spark-on-mesos">Mesos</a> master to run on a cluster. The host parameter is the hostname of the Mesos master. The port must be whichever one the master is configured to use, which is 5050 by default.
+<tr><td> HOST:PORT </td><td> Connect Spark to the given (Mesos)({{HOME_PATH}}running-on-mesos.html) master to run on a cluster. The host parameter is the hostname of the Mesos master. The port must be whichever one the master is configured to use, which is 5050 by default.
<br /><br />
<strong>NOTE:</strong> In earlier versions of Mesos (the <code>old-mesos</code> branch of Spark), you need to use master@HOST:PORT.
</td></tr>
@@ -49,7 +49,7 @@ If you want to run your job on a cluster, you will need to specify the two optio
* `sparkHome`: The path at which Spark is installed on your worker machines (it should be the same on all of them).
* `jars`: A list of JAR files on the local machine containing your job's code and any dependencies, which Spark will deploy to all the worker nodes. You'll need to package your job into a set of JARs using your build system. For example, if you're using SBT, the [sbt-assembly](https://github.com/sbt/sbt-assembly) plugin is a good way to make a single JAR with your code and dependencies.
-If some classes will be shared across _all_ your jobs, it's also possible to copy them to the workers manually and set the `SPARK_CLASSPATH` environment variable in `conf/spark-env.sh` to point to them; see [[Configuration]] for details.
+If some classes will be shared across _all_ your jobs, it's also possible to copy them to the workers manually and set the `SPARK_CLASSPATH` environment variable in `conf/spark-env.sh` to point to them; see [Configuration]({{HOME_PATH}}configuration.html) for details.
# Distributed Datasets
@@ -72,7 +72,7 @@ One important parameter for parallel collections is the number of *slices* to cu
## Hadoop Datasets
-Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, [Amazon S3|http://wiki.apache.org/hadoop/AmazonS3]], Hypertable, HBase, etc). Spark supports text files, [[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop InputFormat.
+Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), Hypertable, HBase, etc). Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop InputFormat.
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3n://`, `kfs://`, etc URI). Here is an example invocation:
@@ -157,6 +157,7 @@ Accumulators are variables that are only "added" to through an associative opera
An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks running on the cluster can then add to it using the `+=` operator. However, they cannot read its value. Only the driver program can read the accumulator's value, using its `value` method.
The interpreter session below shows an accumulator being used to add up the elements of an array:
+
scala> val accum = sc.accumulator(0)
accum: spark.Accumulator[Int] = 0
diff --git a/docs/running-on-amazon-ec2.md b/docs/running-on-amazon-ec2.md
index 26cf9bd767..4e1c191bda 100644
--- a/docs/running-on-amazon-ec2.md
+++ b/docs/running-on-amazon-ec2.md
@@ -6,7 +6,7 @@ This guide describes how to get Spark running on an EC2 cluster. It assumes you
# For Spark 0.5
-Spark now includes some [EC2 Scripts](/ec2-scripts.html) for launching and managing clusters on EC2. You can typically launch a cluster in about five minutes. Follow the instructions at this link for details.
+Spark now includes some [EC2 Scripts]({{HOME_PATH}}ec2-scripts.html) for launching and managing clusters on EC2. You can typically launch a cluster in about five minutes. Follow the instructions at this link for details.
# For older versions of Spark
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
index b6bfff9da3..9807228121 100644
--- a/docs/running-on-mesos.md
+++ b/docs/running-on-mesos.md
@@ -4,12 +4,12 @@ title: Running Spark on Mesos
---
# Running Spark on Mesos
-To run on a cluster, Spark uses the [[Apache Mesos|http://incubator.apache.org/mesos/]] resource manager. Follow the steps below to install Mesos and Spark:
+To run on a cluster, Spark uses the [Apache Mesos](http://incubator.apache.org/mesos/) resource manager. Follow the steps below to install Mesos and Spark:
### For Spark 0.5:
-1. Download and build Spark using the instructions [[here|Home]].
-2. Download Mesos 0.9.0 from a [[mirror|http://www.apache.org/dyn/closer.cgi/incubator/mesos/mesos-0.9.0-incubating/]].
+1. Download and build Spark using the instructions [here]({{ HOME_DIR }}Home).
+2. Download Mesos 0.9.0 from a [mirror](http://www.apache.org/dyn/closer.cgi/incubator/mesos/mesos-0.9.0-incubating/).
3. Configure Mesos using the `configure` script, passing the location of your `JAVA_HOME` using `--with-java-home`. Mesos comes with "template" configure scripts for different platforms, such as `configure.macosx`, that you can run. See the README file in Mesos for other options. **Note:** If you want to run Mesos without installing it into the default paths on your system (e.g. if you don't have administrative privileges to install it), you should also pass the `--prefix` option to `configure` to tell it where to install. For example, pass `--prefix=/home/user/mesos`. By default the prefix is `/usr/local`.
4. Build Mesos using `make`, and then install it using `make install`.
5. Create a file called `spark-env.sh` in Spark's `conf` directory, by copying `conf/spark-env.sh.template`, and add the following lines in it:
@@ -26,7 +26,7 @@ To run on a cluster, Spark uses the [[Apache Mesos|http://incubator.apache.org/m
### For Spark versions before 0.5:
-1. Download and build Spark using the instructions [[here|Home]].
+1. Download and build Spark using the instructions [here]({{ HOME_DIR }}Home).
2. Download either revision 1205738 of Mesos if you're using the master branch of Spark, or the pre-protobuf branch of Mesos if you're using Spark 0.3 or earlier (note that for new users, _we recommend the master branch instead of 0.3_). For revision 1205738 of Mesos, use:
<pre>
svn checkout -r 1205738 http://svn.apache.org/repos/asf/incubator/mesos/trunk mesos
@@ -35,20 +35,20 @@ For the pre-protobuf branch (for Spark 0.3 and earlier), use:
<pre>git clone git://github.com/mesos/mesos
cd mesos
git checkout --track origin/pre-protobuf</pre>
-3. Configure Mesos using the `configure` script, passing the location of your `JAVA_HOME` using `--with-java-home`. Mesos comes with "template" configure scripts for different platforms, such as `configure.template.macosx`, so you can just run the one on your platform if it exists. See the [[Mesos wiki|https://github.com/mesos/mesos/wiki]] for other configuration options.
+3. Configure Mesos using the `configure` script, passing the location of your `JAVA_HOME` using `--with-java-home`. Mesos comes with "template" configure scripts for different platforms, such as `configure.template.macosx`, so you can just run the one on your platform if it exists. See the [Mesos wiki](https://github.com/mesos/mesos/wiki) for other configuration options.
4. Build Mesos using `make`.
5. In Spark's `conf/spark-env.sh` file, add `export MESOS_HOME=<path to Mesos directory>`. If you don't have a `spark-env.sh`, copy `conf/spark-env.sh.template`. You should also set `SCALA_HOME` there if it's not on your system's default path.
6. Copy Spark and Mesos to the _same_ path on all the nodes in the cluster.
7. Configure Mesos for deployment:
* On your master node, edit `MESOS_HOME/conf/masters` to list your master and `MESOS_HOME/conf/slaves` to list the slaves. Also, edit `MESOS_HOME/conf/mesos.conf` and add the line `failover_timeout=1` to change a timeout parameter that is too high by default.
* Run `MESOS_HOME/deploy/start-mesos` to start it up. If all goes well, you should see Mesos's web UI on port 8080 of the master machine.
- * See Mesos's [[deploy instructions|https://github.com/mesos/mesos/wiki/Deploy-Scripts]] for more information on deploying it.
+ * See Mesos's [deploy instructions](https://github.com/mesos/mesos/wiki/Deploy-Scripts) for more information on deploying it.
8. To run a Spark job against the cluster, when you create your `SparkContext`, pass the string `master@HOST:5050` as the first parameter, where `HOST` is the machine running your Mesos master. In addition, pass the location of Spark on your nodes as the third parameter, and a list of JAR files containing your JAR's code as the fourth (these will automatically get copied to the workers). For example:
<pre>new SparkContext("master@HOST:5050", "My Job Name", "/home/user/spark", List("my-job.jar"))</pre>
## Running on Amazon EC2
-If you want to run Spark on Amazon EC2, there's an easy way to launch a cluster with Mesos, Spark, and HDFS pre-configured: the [[EC2 launch scripts|Running-Spark-on-Amazon-EC2]]. This will get you a cluster in about five minutes without any configuration on your part.
+If you want to run Spark on Amazon EC2, there's an easy way to launch a cluster with Mesos, Spark, and HDFS pre-configured: the [EC2 launch scripts]({{HOME_PATH}}running-on-amazon-ec2.html). This will get you a cluster in about five minutes without any configuration on your part.
## Running Alongside Hadoop