diff options
author | Nick Evans <me@nicolasevans.org> | 2015-11-11 13:29:30 -0800 |
---|---|---|
committer | Tathagata Das <tathagata.das1565@gmail.com> | 2015-11-11 13:29:30 -0800 |
commit | dd77e278b99e45c20fdefb1c795f3c5148d577db (patch) | |
tree | 186cef9414da85d54d2f1621c5173fcba2ba0201 | |
parent | a9a6b80c718008aac7c411dfe46355efe58dee2e (diff) | |
download | spark-dd77e278b99e45c20fdefb1c795f3c5148d577db.tar.gz spark-dd77e278b99e45c20fdefb1c795f3c5148d577db.tar.bz2 spark-dd77e278b99e45c20fdefb1c795f3c5148d577db.zip |
[SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD
tdas koeninger
This updates the Spark Streaming + Kafka Integration Guide doc with a working method to access the offsets of a `KafkaRDD` through Python.
Author: Nick Evans <me@nicolasevans.org>
Closes #9289 from manygrams/update_kafka_direct_python_docs.
-rw-r--r-- | docs/streaming-kafka-integration.md | 15 |
1 files changed, 14 insertions, 1 deletions
diff --git a/docs/streaming-kafka-integration.md b/docs/streaming-kafka-integration.md index ab7f0117c0..b00351b2fb 100644 --- a/docs/streaming-kafka-integration.md +++ b/docs/streaming-kafka-integration.md @@ -181,7 +181,20 @@ Next, we discuss how to use this approach in your streaming application. ); </div> <div data-lang="python" markdown="1"> - Not supported yet + offsetRanges = [] + + def storeOffsetRanges(rdd): + global offsetRanges + offsetRanges = rdd.offsetRanges() + return rdd + + def printOffsetRanges(rdd): + for o in offsetRanges: + print "%s %s %s %s" % (o.topic, o.partition, o.fromOffset, o.untilOffset) + + directKafkaStream\ + .transform(storeOffsetRanges)\ + .foreachRDD(printOffsetRanges) </div> </div> |