[SPARK-13369] Add config for number of consecutive fetch failures - spark

diff options

author	Sital Kedia <skedia@fb.com>	2017-03-17 09:33:45 -0500
committer	Imran Rashid <irashid@cloudera.com>	2017-03-17 09:33:58 -0500
commit	7b5d873aef672aa0aee41e338bab7428101e1ad3 (patch)
tree	cf93435eaa5644ddf0c431065f340d9e5fab7414 /python/setup.py
parent	13538cf3dd089222c7e12a3cd6e72ac836fa51ac (diff)
download	spark-7b5d873aef672aa0aee41e338bab7428101e1ad3.tar.gz spark-7b5d873aef672aa0aee41e338bab7428101e1ad3.tar.bz2 spark-7b5d873aef672aa0aee41e338bab7428101e1ad3.zip

[SPARK-13369] Add config for number of consecutive fetch failures

The previously hardcoded max 4 retries per stage is not suitable for all cluster configurations. Since spark retries a stage at the sign of the first fetch failure, you can easily end up with many stage retries to discover all the failures. In particular, two scenarios this value should change are (1) if there are more than 4 executors per node; in that case, it may take 4 retries to discover the problem with each executor on the node and (2) during cluster maintenance on large clusters, where multiple machines are serviced at once, but you also cannot afford total cluster downtime. By making this value configurable, cluster managers can tune this value to something more appropriate to their cluster configuration. Unit tests Author: Sital Kedia <skedia@fb.com> Closes #17307 from sitalkedia/SPARK-13369.

Diffstat (limited to 'python/setup.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: