aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/rddsampler.py
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-11-18 16:25:44 -0800
committerXiangrui Meng <meng@databricks.com>2014-11-18 16:25:44 -0800
commitbb46046154a438df4db30a0e1fd557bd3399ee7b (patch)
tree30e2ac8c1785670596cad195676c9c5036945e0e /python/pyspark/rddsampler.py
parent4a377aff2d36b64a65b54192a987aba44b8f78e0 (diff)
downloadspark-bb46046154a438df4db30a0e1fd557bd3399ee7b.tar.gz
spark-bb46046154a438df4db30a0e1fd557bd3399ee7b.tar.bz2
spark-bb46046154a438df4db30a0e1fd557bd3399ee7b.zip
[SPARK-4433] fix a racing condition in zipWithIndex
Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex
Diffstat (limited to 'python/pyspark/rddsampler.py')
0 files changed, 0 insertions, 0 deletions