diff options
author | Aaron Kimball <aaron@magnify.io> | 2014-03-02 23:26:47 -0800 |
---|---|---|
committer | Reynold Xin <rxin@apache.org> | 2014-03-02 23:26:47 -0800 |
commit | 2b53447f325fa7adcfb9c69fd824467bf420af04 (patch) | |
tree | e06c6861a2700f91a7b593b9e4fa5cd98536470e /docs/streaming-programming-guide.md | |
parent | 55a4f11b5064650024bb13c68639665394c03a0c (diff) | |
download | spark-2b53447f325fa7adcfb9c69fd824467bf420af04.tar.gz spark-2b53447f325fa7adcfb9c69fd824467bf420af04.tar.bz2 spark-2b53447f325fa7adcfb9c69fd824467bf420af04.zip |
SPARK-1173. Improve scala streaming docs.
Clarify imports to add implicit conversions to DStream and
fix other small typos in the streaming intro documentation.
Tested by inspecting output via a local jekyll server, c&p'ing the scala commands into a spark terminal.
Author: Aaron Kimball <aaron@magnify.io>
Closes #64 from kimballa/spark-1173-streaming-docs and squashes the following commits:
6fbff0e [Aaron Kimball] SPARK-1173. Improve scala streaming docs.
Diffstat (limited to 'docs/streaming-programming-guide.md')
-rw-r--r-- | docs/streaming-programming-guide.md | 38 |
1 files changed, 33 insertions, 5 deletions
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 57e8858161..0cc572d1fd 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -58,11 +58,21 @@ do is as follows. <div class="codetabs"> <div data-lang="scala" markdown="1" > +First, we import the names of the Spark Streaming classes, and some implicit +conversions from StreamingContext into our environment, to add useful methods to +other classes we need (like DStream). -First, we create a -[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object, -which is the main entry point for all streaming -functionality. Besides Spark's configuration, we specify that any DStream will be processed +[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the +main entry point for all streaming functionality. + +{% highlight scala %} +import org.apache.spark.streaming._ +import org.apache.spark.streaming.StreamingContext._ +{% endhighlight %} + +Then we create a +[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object. +Besides Spark's configuration, we specify that any DStream will be processed in 1 second batches. {% highlight scala %} @@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) // Print a few of the counts to the console -wordCount.print() +wordCounts.print() {% endhighlight %} The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word, @@ -262,6 +272,24 @@ Time: 1357008430000 ms </td> </table> +If you plan to run the Scala code for Spark Streaming-based use cases in the Spark +shell, you should start the shell with the SparkConfiguration pre-configured to +discard old batches periodically: + +{% highlight bash %} +$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell +{% endhighlight %} + +... and create your StreamingContext by wrapping the existing interactive shell +SparkContext object, `sc`: + +{% highlight scala %} +val ssc = new StreamingContext(sc, Seconds(1)) +{% endhighlight %} + +When working with the shell, you may also need to send a `^D` to your netcat session +to force the pipeline to print the word counts to the console at the sink. + *************************************************************************************************** # Basics |