diff options
author | Shubham Chopra <schopra31@bloomberg.net> | 2017-03-30 22:21:57 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-03-30 22:21:57 +0800 |
commit | b454d4402e5ee7d1a7385d1fe3737581f84d2c72 (patch) | |
tree | f4f536b7622918d1fd0096f6ab510dd6cdc3ffed /sql | |
parent | edc87d76efea7b4d19d9d0c4ddba274a3ccb8752 (diff) | |
download | spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.tar.gz spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.tar.bz2 spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.zip |
[SPARK-15354][CORE] Topology aware block replication strategies
## What changes were proposed in this pull request?
Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack.
The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds.
The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy.
## How was this patch tested?
This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour.
Author: Shubham Chopra <schopra31@bloomberg.net>
Closes #13932 from shubhamchopra/PrioritizerStrategy.
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions