aboutsummaryrefslogtreecommitdiff
path: root/sql/core
diff options
context:
space:
mode:
authorShubham Chopra <schopra31@bloomberg.net>2017-03-30 22:21:57 +0800
committerWenchen Fan <wenchen@databricks.com>2017-03-30 22:21:57 +0800
commitb454d4402e5ee7d1a7385d1fe3737581f84d2c72 (patch)
treef4f536b7622918d1fd0096f6ab510dd6cdc3ffed /sql/core
parentedc87d76efea7b4d19d9d0c4ddba274a3ccb8752 (diff)
downloadspark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.tar.gz
spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.tar.bz2
spark-b454d4402e5ee7d1a7385d1fe3737581f84d2c72.zip
[SPARK-15354][CORE] Topology aware block replication strategies
## What changes were proposed in this pull request? Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack. The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds. The prioritization itself can be thought of an optimization problem to find a minimal set of peers that satisfy certain objectives and replicating to these peers first. The objectives can be used to express richer constraints over and above HDFS like 3-replica strategy. ## How was this patch tested? This patch was tested with unit tests for storage, along with new unit tests to verify prioritization behaviour. Author: Shubham Chopra <schopra31@bloomberg.net> Closes #13932 from shubhamchopra/PrioritizerStrategy.
Diffstat (limited to 'sql/core')
0 files changed, 0 insertions, 0 deletions