[SPARK-15354][CORE] Topology aware block replication strategies - spark

diff options

author	Shubham Chopra <schopra31@bloomberg.net>	2017-03-30 22:21:57 +0800
committer	Wenchen Fan <wenchen@databricks.com>	2017-03-30 22:21:57 +0800
commit	b454d4402e5ee7d1a7385d1fe3737581f84d2c72 (patch)
tree	f4f536b7622918d1fd0096f6ab510dd6cdc3ffed /sql
parent	edc87d76efea7b4d19d9d0c4ddba274a3ccb8752 (diff)
download	spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.tar.gz spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.tar.bz2 spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.zip

[SPARK-15354][CORE] Topology aware block replication strategies

## What changes were proposed in this pull request? Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack. The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds. The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy. ## How was this patch tested? This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour. Author: Shubham Chopra <schopra31@bloomberg.net> Closes #13932 from shubhamchopra/PrioritizerStrategy.

Diffstat (limited to 'sql')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: