SPARK-1173. Improve scala streaming docs.

Clarify imports to add implicit conversions to DStream and fix other small typos in the streaming intro documentation. Tested by inspecting output via a local jekyll server, c&p'ing the scala commands into a spark terminal. Author: Aaron Kimball <aaron@magnify.io> Closes #64 from kimballa/spark-1173-streaming-docs and squashes the following commits: 6fbff0e [Aaron Kimball] SPARK-1173. Improve scala streaming docs.
author: Aaron Kimball <aaron@magnify.io> 2014-03-02 23:26:47 -0800
committer: Reynold Xin <rxin@apache.org> 2014-03-02 23:26:47 -0800
commit: 2b53447f325fa7adcfb9c69fd824467bf420af04 (patch)
tree: e06c6861a2700f91a7b593b9e4fa5cd98536470e /docs/streaming-programming-guide.md
parent: 55a4f11b5064650024bb13c68639665394c03a0c (diff)
download: spark-2b53447f325fa7adcfb9c69fd824467bf420af04.tar.gz
spark-2b53447f325fa7adcfb9c69fd824467bf420af04.tar.bz2
spark-2b53447f325fa7adcfb9c69fd824467bf420af04.zip
1 files changed, 33 insertions, 5 deletions
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 57e8858161..0cc572d1fd 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -58,11 +58,21 @@ do is as follows.
 
 <div class="codetabs">
 <div data-lang="scala"  markdown="1" >
+First, we import the names of the Spark Streaming classes, and some implicit
+conversions from StreamingContext into our environment, to add useful methods to
+other classes we need (like DStream).
 
-First, we create a
-[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object,
-which is the main entry point for all streaming
-functionality. Besides Spark's configuration, we specify that any DStream will be processed
+[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
+main entry point for all streaming functionality.
+
+{% highlight scala %}
+import org.apache.spark.streaming._
+import org.apache.spark.streaming.StreamingContext._
+{% endhighlight %}
+
+Then we create a
+[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
+Besides Spark's configuration, we specify that any DStream will be processed
 in 1 second batches.
 
 {% highlight scala %}
@@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1))
 val wordCounts = pairs.reduceByKey(_ + _)
 
 // Print a few of the counts to the console
-wordCount.print()
+wordCounts.print()
 {% endhighlight %}
 
 The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
@@ -262,6 +272,24 @@ Time: 1357008430000 ms
     </td>
 </table>
 
+If you plan to run the Scala code for Spark Streaming-based use cases in the Spark
+shell, you should start the shell with the SparkConfiguration pre-configured to
+discard old batches periodically:
+
+{% highlight bash %}
+$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell
+{% endhighlight %}
+
+... and create your StreamingContext by wrapping the existing interactive shell
+SparkContext object, `sc`:
+
+{% highlight scala %}
+val ssc = new StreamingContext(sc, Seconds(1))
+{% endhighlight %}
+
+When working with the shell, you may also need to send a `^D` to your netcat session
+to force the pipeline to print the word counts to the console at the sink.
+
 ***************************************************************************************************  
 
 # Basics
author	Aaron Kimball <aaron@magnify.io>	2014-03-02 23:26:47 -0800
committer	Reynold Xin <rxin@apache.org>	2014-03-02 23:26:47 -0800
commit	2b53447f325fa7adcfb9c69fd824467bf420af04 (patch)
tree	e06c6861a2700f91a7b593b9e4fa5cd98536470e /docs/streaming-programming-guide.md
parent	55a4f11b5064650024bb13c68639665394c03a0c (diff)
download	spark-2b53447f325fa7adcfb9c69fd824467bf420af04.tar.gz spark-2b53447f325fa7adcfb9c69fd824467bf420af04.tar.bz2 spark-2b53447f325fa7adcfb9c69fd824467bf420af04.zip