[SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work with Spark jobs - spark

diff options

author	Shixiong Zhu <shixiong@databricks.com>	2016-11-22 14:15:57 -0800
committer	Tathagata Das <tathagata.das1565@gmail.com>	2016-11-22 14:15:57 -0800
commit	2fd101b2f0028e005fbb0bdd29e59af37aa637da (patch)
tree	947520d8e9bf350e6990f7ab985461f87d92f013 /external/flume-assembly/pom.xml
parent	bdc8153e8689262708c7fade5c065bd7fc8a84fc (diff)
download	spark-2fd101b2f0028e005fbb0bdd29e59af37aa637da.tar.gz spark-2fd101b2f0028e005fbb0bdd29e59af37aa637da.tar.bz2 spark-2fd101b2f0028e005fbb0bdd29e59af37aa637da.zip

[SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work with Spark jobs

## What changes were proposed in this pull request? This PR adds `CachedKafkaConsumer.getAndIgnoreLostData` to handle corner cases of `failOnDataLoss=false`. It also resolves [SPARK-18529](https://issues.apache.org/jira/browse/SPARK-18529) after refactoring codes: Timeout will throw a TimeoutException. ## How was this patch tested? Because I cannot find any way to manually control the Kafka server to clean up logs, it's impossible to write unit tests for each corner case. Therefore, I just created `test("stress test for failOnDataLoss=false")` which should cover most of corner cases. I also modified some existing tests to test for both `failOnDataLoss=false` and `failOnDataLoss=true` to make sure it doesn't break existing logic. Author: Shixiong Zhu <shixiong@databricks.com> Closes #15820 from zsxwing/failOnDataLoss.

Diffstat (limited to 'external/flume-assembly/pom.xml')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: