diff options
author | Tathagata Das <tathagata.das1565@gmail.com> | 2015-03-11 18:48:21 -0700 |
---|---|---|
committer | Tathagata Das <tathagata.das1565@gmail.com> | 2015-03-11 18:48:21 -0700 |
commit | cd3b68d93a01f11bd3d5a441b341cb33d227e900 (patch) | |
tree | a427f6dbdae218857ec6e8de066b76bf0f43f8ed /docs/streaming-flume-integration.md | |
parent | 51a79a770a8356bd0ed244af5ca7f1c44c9437d2 (diff) | |
download | spark-cd3b68d93a01f11bd3d5a441b341cb33d227e900.tar.gz spark-cd3b68d93a01f11bd3d5a441b341cb33d227e900.tar.bz2 spark-cd3b68d93a01f11bd3d5a441b341cb33d227e900.zip |
[SPARK-6128][Streaming][Documentation] Updates to Spark Streaming Programming Guide
Updates to the documentation are as follows:
- Added information on Kafka Direct API and Kafka Python API
- Added joins to the main streaming guide
- Improved details on the fault-tolerance semantics
Generated docs located here
http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#fault-tolerance-semantics
More things to add:
- Configuration for Kafka receive rate
- May be add concurrentJobs
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #4956 from tdas/streaming-guide-update-1.3 and squashes the following commits:
819408c [Tathagata Das] Minor fixes.
debe484 [Tathagata Das] Added DataFrames and MLlib
380cf8d [Tathagata Das] Fix link
04167a6 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-guide-update-1.3
0b77486 [Tathagata Das] Updates based on Josh's comments.
86c4c2a [Tathagata Das] Updated streaming guides
82de92a [Tathagata Das] Add Kafka to Python api docs
Diffstat (limited to 'docs/streaming-flume-integration.md')
-rw-r--r-- | docs/streaming-flume-integration.md | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/docs/streaming-flume-integration.md b/docs/streaming-flume-integration.md index 40e17246fe..c8ab146bca 100644 --- a/docs/streaming-flume-integration.md +++ b/docs/streaming-flume-integration.md @@ -5,6 +5,8 @@ title: Spark Streaming + Flume Integration Guide [Apache Flume](https://flume.apache.org/) is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Here we explain how to configure Flume and Spark Streaming to receive data from Flume. There are two approaches to this. +<span class="badge" style="background-color: grey">Python API</span> Flume is not yet available in the Python API. + ## Approach 1: Flume-style Push-based Approach Flume is designed to push data between Flume agents. In this approach, Spark Streaming essentially sets up a receiver that acts an Avro agent for Flume, to which Flume can push the data. Here are the configuration steps. |