aboutsummaryrefslogtreecommitdiff
path: root/ec2/spark_ec2.py
diff options
context:
space:
mode:
authorDavies Liu <davies.liu@gmail.com>2014-08-27 13:18:33 -0700
committerJosh Rosen <joshrosen@apache.org>2014-08-27 13:18:33 -0700
commit4fa2fda88fc7beebb579ba808e400113b512533b (patch)
treed872443f0281f52ee6ab6f19e34c5d1437d8e640 /ec2/spark_ec2.py
parent48f42781dedecd38ddcb2dcf67dead92bb4318f5 (diff)
downloadspark-4fa2fda88fc7beebb579ba808e400113b512533b.tar.gz
spark-4fa2fda88fc7beebb579ba808e400113b512533b.tar.bz2
spark-4fa2fda88fc7beebb579ba808e400113b512533b.zip
[SPARK-2871] [PySpark] add RDD.lookup(key)
RDD.lookup(key) Return the list of values in the RDD for key `key`. This operation is done efficiently if the RDD has a known partitioner by only searching the partition that the key maps to. >>> l = range(1000) >>> rdd = sc.parallelize(zip(l, l), 10) >>> rdd.lookup(42) # slow [42] >>> sorted = rdd.sortByKey() >>> sorted.lookup(42) # fast [42] It also clean up the code in RDD.py, and fix several bugs (related to preservesPartitioning). Author: Davies Liu <davies.liu@gmail.com> Closes #2093 from davies/lookup and squashes the following commits: 1789cd4 [Davies Liu] `f` in foreach could be generator or not. 2871b80 [Davies Liu] Merge branch 'master' into lookup c6390ea [Davies Liu] address all comments 0f1bce8 [Davies Liu] add test case for lookup() be0e8ba [Davies Liu] fix preservesPartitioning eb1305d [Davies Liu] add RDD.lookup(key)
Diffstat (limited to 'ec2/spark_ec2.py')
0 files changed, 0 insertions, 0 deletions