aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/configuration.md13
-rw-r--r--docs/streaming-programming-guide.md13
2 files changed, 25 insertions, 1 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index a2cc7a37e2..e287591f3f 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1434,6 +1434,19 @@ Apart from these, the following properties are also available, and may be useful
<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr>
+ <td><code>spark.streaming.backpressure.enabled</code></td>
+ <td>false</td>
+ <td>
+ Enables or disables Spark Streaming's internal backpressure mechanism (since 1.5).
+ This enables the Spark Streaming to control the receiving rate based on the
+ current batch scheduling delays and processing times so that the system receives
+ only as fast as the system can process. Internally, this dynamically sets the
+ maximum receiving rate of receivers. This rate is upper bounded by the values
+ `spark.streaming.receiver.maxRate` and `spark.streaming.kafka.maxRatePerPartition`
+ if they are set (see below).
+ </td>
+</tr>
+<tr>
<td><code>spark.streaming.blockInterval</code></td>
<td>200ms</td>
<td>
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index a1acf83f75..c751dbb417 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -1807,7 +1807,7 @@ To run a Spark Streaming applications, you need to have the following.
+ *Mesos* - [Marathon](https://github.com/mesosphere/marathon) has been used to achieve this
with Mesos.
-- *[Since Spark 1.2] Configuring write ahead logs* - Since Spark 1.2,
+- *Configuring write ahead logs* - Since Spark 1.2,
we have introduced _write ahead logs_ for achieving strong
fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into
a write ahead log in the configuration checkpoint directory. This prevents data loss on driver
@@ -1822,6 +1822,17 @@ To run a Spark Streaming applications, you need to have the following.
stored in a replicated storage system. This can be done by setting the storage level for the
input stream to `StorageLevel.MEMORY_AND_DISK_SER`.
+- *Setting the max receiving rate* - If the cluster resources is not large enough for the streaming
+ application to process data as fast as it is being received, the receivers can be rate limited
+ by setting a maximum rate limit in terms of records / sec.
+ See the [configuration parameters](configuration.html#spark-streaming)
+ `spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition`
+ for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that
+ eliminate the need to set this rate limit, as Spark Streaming automatically figures out the
+ rate limits and dynamically adjusts them if the processing conditions change. This backpressure
+ can be enabled by setting the [configuration parameter](configuration.html#spark-streaming)
+ `spark.streaming.backpressure.enabled` to `true`.
+
### Upgrading Application Code
{:.no_toc}