From 5d4a0ac794bff3bc3c579ab6e0658b17bdeeab43 Mon Sep 17 00:00:00 2001
From: Matei Zaharia <matei@eecs.berkeley.edu>
Date: Mon, 25 Feb 2013 14:23:03 -0800
Subject: Some tweaks to docs

---
 docs/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'docs/index.md')
diff --git a/docs/index.md b/docs/index.md
index c6ef507cb0..fd74a051e0 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -58,9 +58,9 @@ of `project/SparkBuild.scala`, then rebuilding Spark (`sbt/sbt clean compile`).
 
 * [Quick Start](quick-start.html): a quick introduction to the Spark API; start here!
 * [Spark Programming Guide](scala-programming-guide.html): an overview of Spark concepts, and details on the Scala API
-* [Streaming Programming Guide](streaming-programming-guide.html): an API preview of Spark Streaming
 * [Java Programming Guide](java-programming-guide.html): using Spark from Java
 * [Python Programming Guide](python-programming-guide.html): using Spark from Python
+* [Spark Streaming Guide](streaming-programming-guide.html): using the alpha release of Spark Streaming
 
 **API Docs:**
 
-- 
cgit v1.2.3


From 22334eafd96d0cb2b2206c9ad5b458bd8d91eb97 Mon Sep 17 00:00:00 2001
From: Matei Zaharia <matei@eecs.berkeley.edu>
Date: Tue, 26 Feb 2013 22:52:38 -0800
Subject: Some tweaks to docs

---
 docs/configuration.md |  7 +++++++
 docs/ec2-scripts.md   | 17 ++++++++++-------
 docs/index.md         |  8 +-------
 3 files changed, 18 insertions(+), 14 deletions(-)

(limited to 'docs/index.md')

diff --git a/docs/configuration.md b/docs/configuration.md
index 04eb6daaa5..17fdbf04d1 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -133,6 +133,13 @@ Apart from these, the following properties are also available, and may be useful
     it if you configure your own old generation size.
   </td>
 </tr>
+<tr>
+  <td>spark.ui.port</td>
+  <td>(random)</td>
+  <td>
+    Port for your application's dashboard, which shows memory usage of each RDD.
+  </td>
+</tr>
 <tr>
   <td>spark.shuffle.compress</td>
   <td>true</td>
diff --git a/docs/ec2-scripts.md b/docs/ec2-scripts.md
index 931b7a66bd..dc57035eba 100644
--- a/docs/ec2-scripts.md
+++ b/docs/ec2-scripts.md
@@ -45,9 +45,9 @@ identify machines belonging to each cluster in the Amazon EC2 Console.
     key pair, `<num-slaves>` is the number of slave nodes to launch (try
     1 at first), and `<cluster-name>` is the name to give to your
     cluster.
--   After everything launches, check that Mesos is up and sees all the
-    slaves by going to the Mesos Web UI link printed at the end of the
-    script (`http://<master-hostname>:8080`).
+-   After everything launches, check that the cluster scheduler is up and sees
+    all the slaves by going to its web UI, which will be printed at the end of
+    the script (typically `http://<master-hostname>:8080`).
 
 You can also run `./spark-ec2 --help` to see more usage options. The
 following options are worth pointing out:
@@ -68,6 +68,9 @@ available.
 -    `--ebs-vol-size=GB` will attach an EBS volume with a given amount
      of space to each node so that you can have a persistent HDFS cluster
      on your nodes across cluster restarts (see below).
+-    `--spot-price=PRICE` will launch the worker nodes as
+     [Spot Instances](http://aws.amazon.com/ec2/spot-instances/),
+     bidding for the given maximum price (in dollars).
 -    If one of your launches fails due to e.g. not having the right
 permissions on your private key file, you can run `launch` with the
 `--resume` option to restart the setup process on an existing cluster.
@@ -80,7 +83,7 @@ permissions on your private key file, you can run `launch` with the
     above. (This is just for convenience; you could also use
     the EC2 console.)
 -   To deploy code or data within your cluster, you can log in and use the
-    provided script `~/mesos-ec2/copy-dir`, which,
+    provided script `~/spark-ec2/copy-dir`, which,
     given a directory path, RSYNCs it to the same location on all the slaves.
 -   If your job needs to access large datasets, the fastest way to do
     that is to load them from Amazon S3 or an Amazon EBS device into an
@@ -106,7 +109,7 @@ You can edit `/root/spark/conf/spark-env.sh` on each machine to set Spark config
 as JVM options and, most crucially, the amount of memory to use per machine (`SPARK_MEM`).
 This file needs to be copied to **every machine** to reflect the change. The easiest way to do this
 is to use a script we provide called `copy-dir`. First edit your `spark-env.sh` file on the master, 
-then run `~/mesos-ec2/copy-dir /root/spark/conf` to RSYNC it to all the workers.
+then run `~/spark-ec2/copy-dir /root/spark/conf` to RSYNC it to all the workers.
 
 The [configuration guide](configuration.html) describes the available configuration options.
 
@@ -152,10 +155,10 @@ If you have a patch or suggestion for one of these limitations, feel free to
 
 # Using a Newer Spark Version
 
-The Spark EC2 machine images may not come with the latest version of Spark. To use a newer version, you can run `git pull` to pull in `/root/spark` to pull in the latest version of Spark from `git`, and build it using `sbt/sbt compile`. You will also need to copy it to all the other nodes in the cluster using `~/mesos-ec2/copy-dir /root/spark`.
+The Spark EC2 machine images may not come with the latest version of Spark. To use a newer version, you can run `git pull` to pull in `/root/spark` to pull in the latest version of Spark from `git`, and build it using `sbt/sbt compile`. You will also need to copy it to all the other nodes in the cluster using `~/spark-ec2/copy-dir /root/spark`.
 
 # Accessing Data in S3
 
-Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form `s3n://<id>:<secret>@<bucket>/path`, where `<id>` is your Amazon access key ID and `<secret>` is your Amazon secret access key. Note that you should escape any `/` characters in the secret key as `%2F`. Full instructions can be found on the [Hadoop S3 page](http://wiki.apache.org/hadoop/AmazonS3).
+Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form `s3n://<bucket>/path`. You will also need to set your Amazon security credentials, either by setting the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` before your program or through `SparkContext.hadoopConfiguration`. Full instructions on S3 access using the Hadoop input libraries can be found on the [Hadoop S3 page](http://wiki.apache.org/hadoop/AmazonS3).
 
 In addition to using a single input file, you can also use a directory of files as input by simply giving the path to the directory.
diff --git a/docs/index.md b/docs/index.md
index fd74a051e0..3a775decdb 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -3,15 +3,9 @@ layout: global
 title: Spark Overview
 ---
 
-{% comment %}
-TODO(andyk): Rewrite to make the Java API a first class part of the story.
-{% endcomment %}
-
 Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter.
 It provides clean, language-integrated APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html), with a rich array of parallel operators.
-Spark can run on top of the [Apache Mesos](http://incubator.apache.org/mesos/) cluster manager,
-[Hadoop YARN](http://hadoop.apache.org/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/YARN.html),
-Amazon EC2, or without an independent resource manager ("standalone mode").
+Spark can run on top of the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode").
 
 # Downloading
 
-- 
cgit v1.2.3


From fadeb1ddeaae734057a8dd518ecdb9422b47b482 Mon Sep 17 00:00:00 2001
From: Matei Zaharia <matei@eecs.berkeley.edu>
Date: Tue, 26 Feb 2013 23:20:49 -0800
Subject: More doc tweaks

---
 docs/index.md                       | 5 +++--
 docs/streaming-programming-guide.md | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

(limited to 'docs/index.md')

diff --git a/docs/index.md b/docs/index.md
index 3a775decdb..45facd8e63 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -5,7 +5,7 @@ title: Spark Overview
 
 Spark is a MapReduce-like cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter.
 It provides clean, language-integrated APIs in [Scala](scala-programming-guide.html), [Java](java-programming-guide.html), and [Python](python-programming-guide.html), with a rich array of parallel operators.
-Spark can run on top of the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode").
+Spark can run on the Apache Mesos cluster manager, Hadoop YARN, Amazon EC2, or without an independent resource manager ("standalone mode").
 
 # Downloading
 
@@ -86,7 +86,8 @@ of `project/SparkBuild.scala`, then rebuilding Spark (`sbt/sbt clean compile`).
   [slides](http://ampcamp.berkeley.edu/agenda-2012) and [exercises](http://ampcamp.berkeley.edu/exercises-2012) are
   available online for free.
 * [Code Examples](http://spark-project.org/examples.html): more are also available in the [examples subfolder](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/examples) of Spark
-* [Paper Describing the Spark System](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
+* [Paper Describing Spark](http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)
+* [Paper Describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)
 
 # Community
 
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 18ad3ed255..b30699cf3d 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -514,3 +514,4 @@ JavaPairDStream<String, Integer> wordCounts = words.map(
 # Where to Go from Here
 * API docs - [Scala](api/streaming/index.html#spark.streaming.package) and [Java](api/streaming/index.html#spark.streaming.api.java.package)
 * More examples - [Scala](https://github.com/mesos/spark/tree/master/examples/src/main/scala/spark/streaming/examples) and [Java](https://github.com/mesos/spark/tree/master/examples/src/main/java/spark/streaming/examples)
+* [Paper describing Spark Streaming](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf)
-- 
cgit v1.2.3