diff options
author | Brennon York <brennon.york@capitalone.com> | 2015-02-25 14:11:12 -0800 |
---|---|---|
committer | Ankur Dave <ankurdave@gmail.com> | 2015-02-25 14:11:12 -0800 |
commit | 9f603fce78fcc997926e9a72dec44d48cbc396fc (patch) | |
tree | 3f1d1cc53a7c24dbc2b05ee41d66c8dc77bb4466 /python/pyspark/rdd.py | |
parent | a777c65da9bc636e5cf5426e15a2e76d6b21b744 (diff) | |
download | spark-9f603fce78fcc997926e9a72dec44d48cbc396fc.tar.gz spark-9f603fce78fcc997926e9a72dec44d48cbc396fc.tar.bz2 spark-9f603fce78fcc997926e9a72dec44d48cbc396fc.zip |
[SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharing
Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the `zipPartitions` method. This fix tests whether the partitions are equal or not and, if not, will repartition the other to match the partition size of the calling VertexRDD.
Author: Brennon York <brennon.york@capitalone.com>
Closes #4705 from brennonyork/SPARK-1955 and squashes the following commits:
0882590 [Brennon York] updated to properly handle differently-partitioned vertexRDDs
Diffstat (limited to 'python/pyspark/rdd.py')
0 files changed, 0 insertions, 0 deletions