aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-kafka-integration.md
diff options
context:
space:
mode:
authorShixiong Zhu <shixiong@databricks.com>2016-01-18 15:38:03 -0800
committerTathagata Das <tathagata.das1565@gmail.com>2016-01-18 15:38:03 -0800
commita973f483f6b819ed4ecac27ff5c064ea13a8dd71 (patch)
treee55be6fce5841d0dd1d197f3276bf3bf3ae2398f /docs/streaming-kafka-integration.md
parent404190221a788ebc3a0cbf5cb47cf532436ce965 (diff)
downloadspark-a973f483f6b819ed4ecac27ff5c064ea13a8dd71.tar.gz
spark-a973f483f6b819ed4ecac27ff5c064ea13a8dd71.tar.bz2
spark-a973f483f6b819ed4ecac27ff5c064ea13a8dd71.zip
[SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc
This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10746 from zsxwing/flume-doc.
Diffstat (limited to 'docs/streaming-kafka-integration.md')
-rw-r--r--docs/streaming-kafka-integration.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/streaming-kafka-integration.md b/docs/streaming-kafka-integration.md
index 9454714eeb..015a2f1fa0 100644
--- a/docs/streaming-kafka-integration.md
+++ b/docs/streaming-kafka-integration.md
@@ -71,7 +71,7 @@ Next, we discuss how to use this approach in your streaming application.
./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ...
Alternatively, you can also download the JAR of the Maven artifact `spark-streaming-kafka-assembly` from the
- [Maven repository](http://search.maven.org/#search|ga|1|a%3A%22spark-streaming-kafka-assembly_2.10%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22) and add it to `spark-submit` with `--jars`.
+ [Maven repository](http://search.maven.org/#search|ga|1|a%3A%22spark-streaming-kafka-assembly_{{site.SCALA_BINARY_VERSION}}%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22) and add it to `spark-submit` with `--jars`.
## Approach 2: Direct Approach (No Receivers)
This new receiver-less "direct" approach has been introduced in Spark 1.3 to ensure stronger end-to-end guarantees. Instead of using receivers to receive data, this approach periodically queries Kafka for the latest offsets in each topic+partition, and accordingly defines the offset ranges to process in each batch. When the jobs to process the data are launched, Kafka's simple consumer API is used to read the defined ranges of offsets from Kafka (similar to read files from a file system). Note that this is an experimental feature introduced in Spark 1.3 for the Scala and Java API, in Spark 1.4 for the Python API.
@@ -207,4 +207,4 @@ Next, we discuss how to use this approach in your streaming application.
Another thing to note is that since this approach does not use Receivers, the standard receiver-related (that is, [configurations](configuration.html) of the form `spark.streaming.receiver.*` ) will not apply to the input DStreams created by this approach (will apply to other input DStreams though). Instead, use the [configurations](configuration.html) `spark.streaming.kafka.*`. An important one is `spark.streaming.kafka.maxRatePerPartition` which is the maximum rate (in messages per second) at which each Kafka partition will be read by this direct API.
-3. **Deploying:** This is same as the first approach, for Scala, Java and Python.
+3. **Deploying:** This is same as the first approach.