SPARK-1839: PySpark RDD#take() shouldn't always read from driver - spark

diff options

author	Aaron Davidson <aaron@databricks.com>	2014-05-31 13:04:57 -0700
committer	Reynold Xin <rxin@apache.org>	2014-05-31 13:04:57 -0700
commit	9909efc10aaa62c47fd7c4c9da73ac8c56a454d5 (patch)
tree	86ab8e6477ab4a631b3a91f0e89007ca69c78d37 /yarn
parent	7d52777effd0ff41aed545f53d2ab8de2364a188 (diff)
download	spark-9909efc10aaa62c47fd7c4c9da73ac8c56a454d5.tar.gz spark-9909efc10aaa62c47fd7c4c9da73ac8c56a454d5.tar.bz2 spark-9909efc10aaa62c47fd7c4c9da73ac8c56a454d5.zip

SPARK-1839: PySpark RDD#take() shouldn't always read from driver

This patch simply ports over the Scala implementation of RDD#take(), which reads the first partition at the driver, then decides how many more partitions it needs to read and will possibly start a real job if it's more than 1. (Note that SparkContext#runJob(allowLocal=true) only runs the job locally if there's 1 partition selected and no parent stages.) Author: Aaron Davidson <aaron@databricks.com> Closes #922 from aarondav/take and squashes the following commits: fa06df9 [Aaron Davidson] SPARK-1839: PySpark RDD#take() shouldn't always read from driver

Diffstat (limited to 'yarn')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: