aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/heapq3.py
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2015-09-14 21:47:40 -0400
committerMatei Zaharia <matei@databricks.com>2015-09-14 21:47:40 -0400
commit1a0955250bb65cd6f5818ad60efb62ea4b45d18e (patch)
tree91afe7fda6170e8e2c85563b5f336418c14ae6cc /python/pyspark/heapq3.py
parent7b6c856367b9c36348e80e83959150da9656c4dd (diff)
downloadspark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.tar.gz
spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.tar.bz2
spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.zip
[SPARK-9851] Support submitting map stages individually in DAGScheduler
This patch adds support for submitting map stages in a DAG individually so that we can make downstream decisions after seeing statistics about their output, as part of SPARK-9850. I also added more comments to many of the key classes in DAGScheduler. By itself, the patch is not super useful except maybe to switch between a shuffle and broadcast join, but with the other subtasks of SPARK-9850 we'll be able to do more interesting decisions. The main entry point is SparkContext.submitMapStage, which lets you run a map stage and see stats about the map output sizes. Other stats could also be collected through accumulators. See AdaptiveSchedulingSuite for a short example. Author: Matei Zaharia <matei@databricks.com> Closes #8180 from mateiz/spark-9851.
Diffstat (limited to 'python/pyspark/heapq3.py')
0 files changed, 0 insertions, 0 deletions