aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-kafka-integration.md
diff options
context:
space:
mode:
authorRohan Bhanderi <rohan.bhanderi@sjsu.edu>2015-10-23 01:10:46 -0700
committerReynold Xin <rxin@databricks.com>2015-10-23 01:10:46 -0700
commit16dc9f344c08deee104090106cb0a537a90e33fc (patch)
treedcbf2ceeb08778bff6e2b347d343231d10f1d5a0 /docs/streaming-kafka-integration.md
parentcdea0174e32a5f4c28fd59899b2e9774994303d5 (diff)
downloadspark-16dc9f344c08deee104090106cb0a537a90e33fc.tar.gz
spark-16dc9f344c08deee104090106cb0a537a90e33fc.tar.bz2
spark-16dc9f344c08deee104090106cb0a537a90e33fc.zip
Fix typo "Received" to "Receiver" in streaming-kafka-integration.md
Removed typo on line 8 in markdown : "Received" -> "Receiver" Author: Rohan Bhanderi <rohan.bhanderi@sjsu.edu> Closes #9242 from RohanBhanderi/patch-1.
Diffstat (limited to 'docs/streaming-kafka-integration.md')
-rw-r--r--docs/streaming-kafka-integration.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/streaming-kafka-integration.md b/docs/streaming-kafka-integration.md
index 5db39ae54a..ab7f0117c0 100644
--- a/docs/streaming-kafka-integration.md
+++ b/docs/streaming-kafka-integration.md
@@ -5,7 +5,7 @@ title: Spark Streaming + Kafka Integration Guide
[Apache Kafka](http://kafka.apache.org/) is publish-subscribe messaging rethought as a distributed, partitioned, replicated commit log service. Here we explain how to configure Spark Streaming to receive data from Kafka. There are two approaches to this - the old approach using Receivers and Kafka's high-level API, and a new experimental approach (introduced in Spark 1.3) without using Receivers. They have different programming models, performance characteristics, and semantics guarantees, so read on for more details.
## Approach 1: Receiver-based Approach
-This approach uses a Receiver to receive the data. The Received is implemented using the Kafka high-level consumer API. As with all receivers, the data received from Kafka through a Receiver is stored in Spark executors, and then jobs launched by Spark Streaming processes the data.
+This approach uses a Receiver to receive the data. The Receiver is implemented using the Kafka high-level consumer API. As with all receivers, the data received from Kafka through a Receiver is stored in Spark executors, and then jobs launched by Spark Streaming processes the data.
However, under default configuration, this approach can lose data under failures (see [receiver reliability](streaming-programming-guide.html#receiver-reliability). To ensure zero-data loss, you have to additionally enable Write Ahead Logs in Spark Streaming (introduced in Spark 1.2). This synchronously saves all the received Kafka data into write ahead logs on a distributed file system (e.g HDFS), so that all the data can be recovered on failure. See [Deploying section](streaming-programming-guide.html#deploying-applications) in the streaming programming guide for more details on Write Ahead Logs.