aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-programming-guide.md
diff options
context:
space:
mode:
authorjay@apache.org <jayunit100>2014-11-05 15:45:34 -0800
committerMatei Zaharia <matei@databricks.com>2014-11-05 15:45:34 -0800
commit868cd4c3ca11e6ecc4425b972d9a20c360b52425 (patch)
treeaf724bbb6937eb6bd35785b41a794065ff71a0aa /docs/streaming-programming-guide.md
parent61a5cced049a8056292ba94f23fa7bd040f50685 (diff)
downloadspark-868cd4c3ca11e6ecc4425b972d9a20c360b52425.tar.gz
spark-868cd4c3ca11e6ecc4425b972d9a20c360b52425.tar.bz2
spark-868cd4c3ca11e6ecc4425b972d9a20c360b52425.zip
SPARK-4040. Update documentation to exemplify use of local (n) value, fo...
This is a minor docs update which helps to clarify the way local[n] is used for streaming apps. Author: jay@apache.org <jayunit100> Closes #2964 from jayunit100/SPARK-4040 and squashes the following commits: 35b5a5e [jay@apache.org] SPARK-4040: Update documentation to exemplify use of local (n) value.
Diffstat (limited to 'docs/streaming-programming-guide.md')
-rw-r--r--docs/streaming-programming-guide.md14
1 files changed, 9 insertions, 5 deletions
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 8bbba88b31..44a1f3ad75 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -68,7 +68,9 @@ import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
-// Create a local StreamingContext with two working thread and batch interval of 1 second
+// Create a local StreamingContext with two working thread and batch interval of 1 second.
+// The master requires 2 cores to prevent from a starvation scenario.
+
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
{% endhighlight %}
@@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated with a single [Receiver](
A receiver is run within a Spark worker/executor as a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Hence, it is important to remember that Spark Streaming application needs to be allocated enough cores to process the received data, as well as, to run the receiver(s). Therefore, few important points to remember are:
-##### Points to remember:
+##### Points to remember
{:.no_toc}
-- If the number of cores allocated to the application is less than or equal to the number of input DStreams / receivers, then the system will receive data, but not be able to process them.
-- When running locally, if you master URL is set to "local", then there is only one core to run tasks. That is insufficient for programs with even one input DStream (file streams are okay) as the receiver will occupy that core and there will be no core left to process the data.
-
+- If the number of threads allocated to the application is less than or equal to the number of input DStreams / receivers, then the system will receive data, but not be able to process them.
+- When running locally, if you master URL is set to "local", then there is only one core to run tasks. That is insufficient for programs using a DStream as the receiver (file streams are okay). So, a "local" master URL in a streaming app is generally going to cause starvation for the processor.
+Thus in any streaming app, you generally will want to allocate more than one thread (i.e. set your master to "local[2]") when testing locally.
+See [Spark Properties] (configuration.html#spark-properties.html).
+
### Basic Sources
{:.no_toc}