aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-flume-integration.md
diff options
context:
space:
mode:
authorTathagata Das <tathagata.das1565@gmail.com>2015-03-11 18:48:21 -0700
committerTathagata Das <tathagata.das1565@gmail.com>2015-03-11 18:48:21 -0700
commitcd3b68d93a01f11bd3d5a441b341cb33d227e900 (patch)
treea427f6dbdae218857ec6e8de066b76bf0f43f8ed /docs/streaming-flume-integration.md
parent51a79a770a8356bd0ed244af5ca7f1c44c9437d2 (diff)
downloadspark-cd3b68d93a01f11bd3d5a441b341cb33d227e900.tar.gz
spark-cd3b68d93a01f11bd3d5a441b341cb33d227e900.tar.bz2
spark-cd3b68d93a01f11bd3d5a441b341cb33d227e900.zip
[SPARK-6128][Streaming][Documentation] Updates to Spark Streaming Programming Guide
Updates to the documentation are as follows: - Added information on Kafka Direct API and Kafka Python API - Added joins to the main streaming guide - Improved details on the fault-tolerance semantics Generated docs located here http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#fault-tolerance-semantics More things to add: - Configuration for Kafka receive rate - May be add concurrentJobs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #4956 from tdas/streaming-guide-update-1.3 and squashes the following commits: 819408c [Tathagata Das] Minor fixes. debe484 [Tathagata Das] Added DataFrames and MLlib 380cf8d [Tathagata Das] Fix link 04167a6 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-guide-update-1.3 0b77486 [Tathagata Das] Updates based on Josh's comments. 86c4c2a [Tathagata Das] Updated streaming guides 82de92a [Tathagata Das] Add Kafka to Python api docs
Diffstat (limited to 'docs/streaming-flume-integration.md')
-rw-r--r--docs/streaming-flume-integration.md2
1 files changed, 2 insertions, 0 deletions
diff --git a/docs/streaming-flume-integration.md b/docs/streaming-flume-integration.md
index 40e17246fe..c8ab146bca 100644
--- a/docs/streaming-flume-integration.md
+++ b/docs/streaming-flume-integration.md
@@ -5,6 +5,8 @@ title: Spark Streaming + Flume Integration Guide
[Apache Flume](https://flume.apache.org/) is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Here we explain how to configure Flume and Spark Streaming to receive data from Flume. There are two approaches to this.
+<span class="badge" style="background-color: grey">Python API</span> Flume is not yet available in the Python API.
+
## Approach 1: Flume-style Push-based Approach
Flume is designed to push data between Flume agents. In this approach, Spark Streaming essentially sets up a receiver that acts an Avro agent for Flume, to which Flume can push the data. Here are the configuration steps.