aboutsummaryrefslogtreecommitdiff
path: root/bin/pyspark
diff options
context:
space:
mode:
authorColin Patrick Mccabe <cmccabe@cloudera.com>2014-10-02 00:29:31 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-10-02 00:29:31 -0700
commit6e27cb630de69fa5acb510b4e2f6b980742b1957 (patch)
tree720a0c40776c9829a761022e0a9a6da502667ebb /bin/pyspark
parentbbdf1de84ffdd3bd172f17975d2f1422a9bcf2c6 (diff)
downloadspark-6e27cb630de69fa5acb510b4e2f6b980742b1957.tar.gz
spark-6e27cb630de69fa5acb510b4e2f6b980742b1957.tar.bz2
spark-6e27cb630de69fa5acb510b4e2f6b980742b1957.zip
SPARK-1767: Prefer HDFS-cached replicas when scheduling data-local tasks
This change reorders the replicas returned by HadoopRDD#getPreferredLocations so that replicas cached by HDFS are at the start of the list. This requires Hadoop 2.5 or higher; previous versions of Hadoop do not expose the information needed to determine whether a replica is cached. Author: Colin Patrick Mccabe <cmccabe@cloudera.com> Closes #1486 from cmccabe/SPARK-1767 and squashes the following commits: 338d4f8 [Colin Patrick Mccabe] SPARK-1767: Prefer HDFS-cached replicas when scheduling data-local tasks
Diffstat (limited to 'bin/pyspark')
0 files changed, 0 insertions, 0 deletions