From dd77e278b99e45c20fdefb1c795f3c5148d577db Mon Sep 17 00:00:00 2001 From: Nick Evans Date: Wed, 11 Nov 2015 13:29:30 -0800 Subject: [SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD tdas koeninger This updates the Spark Streaming + Kafka Integration Guide doc with a working method to access the offsets of a `KafkaRDD` through Python. Author: Nick Evans Closes #9289 from manygrams/update_kafka_direct_python_docs. --- docs/streaming-kafka-integration.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'docs/streaming-kafka-integration.md') diff --git a/docs/streaming-kafka-integration.md b/docs/streaming-kafka-integration.md index ab7f0117c0..b00351b2fb 100644 --- a/docs/streaming-kafka-integration.md +++ b/docs/streaming-kafka-integration.md @@ -181,7 +181,20 @@ Next, we discuss how to use this approach in your streaming application. );
- Not supported yet + offsetRanges = [] + + def storeOffsetRanges(rdd): + global offsetRanges + offsetRanges = rdd.offsetRanges() + return rdd + + def printOffsetRanges(rdd): + for o in offsetRanges: + print "%s %s %s %s" % (o.topic, o.partition, o.fromOffset, o.untilOffset) + + directKafkaStream\ + .transform(storeOffsetRanges)\ + .foreachRDD(printOffsetRanges)
-- cgit v1.2.3