Fixes typos in Spark Streaming Programming Guide

These typos were reported on the spark-users mailing list, see: https://groups.google.com/d/msg/spark-users/SyLGgJlKCrI/LpeBypOkSMUJ
author: Andy Konwinski <andyk@berkeley.edu> 2013-07-12 11:51:14 -0700
committer: Andy Konwinski <andyk@berkeley.edu> 2013-07-12 11:51:14 -0700
commit: cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b (patch)
tree: 7907ca5526adc54e47a37496d63ebc9cc5ab7657 /docs
parent: 018d04c64e68876f4491fbccb3752e9da0a0c5d3 (diff)
download: spark-cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b.tar.gz
spark-cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b.tar.bz2
spark-cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index f5788dc467..8cd1b0cd66 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -7,7 +7,7 @@ title: Spark Streaming Programming Guide
 {:toc}
 
 # Overview
-A Spark Streaming application is very similar to a Spark application; it consists of a *driver program* that runs the user's `main` function and continuous executes various *parallel operations* on input streams of data. The main abstraction Spark Streaming provides is a *discretized stream* (DStream), which is a continuous sequence of RDDs (distributed collections of elements) representing a continuous stream of data. DStreams can be created from live incoming data (such as data from a socket, Kafka, etc.) or can be generated by transformong existing DStreams using parallel operators like `map`, `reduce`, and `window`. The basic processing model is as follows: 
+A Spark Streaming application is very similar to a Spark application; it consists of a *driver program* that runs the user's `main` function and continuous executes various *parallel operations* on input streams of data. The main abstraction Spark Streaming provides is a *discretized stream* (DStream), which is a continuous sequence of RDDs (distributed collections of elements) representing a continuous stream of data. DStreams can be created from live incoming data (such as data from a socket, Kafka, etc.) or can be generated by transforming existing DStreams using parallel operators like `map`, `reduce`, and `window`. The basic processing model is as follows: 
 (i) While a Spark Streaming driver program is running, the system receives data from various sources and and divides it into batches. Each batch of data is treated as an RDD, that is, an immutable parallel collection of data. These input RDDs are saved in memory and replicated to two nodes for fault-tolerance. This sequence of RDDs is collectively called an InputDStream.
 (ii) Data received by InputDStreams are processed using DStream operations. Since all data is represented as RDDs and all DStream operations as RDD operations, data is automatically recovered in the event of node failures.  
 
@@ -20,7 +20,7 @@ The first thing a Spark Streaming program must do is create a `StreamingContext`
 new StreamingContext(master, appName, batchDuration, [sparkHome], [jars])
 {% endhighlight %}
 
-The `master` parameter is a standard [Spark cluster URL](scala-programming-guide.html#master-urls) and can be "local" for local testing. The `appName` is a name of your program, which will be shown on your cluster's web UI. The `batchDuration` is the size of the batches (as explained earlier). This must be set carefully such the cluster can keep up with the processing of the data streams. Start with something conservative like 5 seconds. See the [Performance Tuning](#setting-the-right-batch-size) section for a detailed discussion. Finally, `sparkHome` and `jars` are necessary when running on a cluster to specify the location of your code, as described in the [Spark programming guide](scala-programming-guide.html#deploying-code-on-a-cluster).
+The `master` parameter is a standard [Spark cluster URL](scala-programming-guide.html#master-urls) and can be "local" for local testing. The `appName` is a name of your program, which will be shown on your cluster's web UI. The `batchDuration` is the size of the batches (as explained earlier). This must be set carefully such that the cluster can keep up with the processing of the data streams. Start with something conservative like 5 seconds. See the [Performance Tuning](#setting-the-right-batch-size) section for a detailed discussion. Finally, `sparkHome` and `jars` are necessary when running on a cluster to specify the location of your code, as described in the [Spark programming guide](scala-programming-guide.html#deploying-code-on-a-cluster).
 
 This constructor creates a SparkContext for your job as well, which can be accessed with `streamingContext.sparkContext`.
author	Andy Konwinski <andyk@berkeley.edu>	2013-07-12 11:51:14 -0700
committer	Andy Konwinski <andyk@berkeley.edu>	2013-07-12 11:51:14 -0700
commit	cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b (patch)
tree	7907ca5526adc54e47a37496d63ebc9cc5ab7657 /docs
parent	018d04c64e68876f4491fbccb3752e9da0a0c5d3 (diff)
download	spark-cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b.tar.gz spark-cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b.tar.bz2 spark-cd7259b4b8d8abbff6db963fd8f84d4bd0b3737b.zip