[SPARK-9851] Support submitting map stages individually in DAGScheduler - spark

diff options

author	Matei Zaharia <matei@databricks.com>	2015-09-14 21:47:40 -0400
committer	Matei Zaharia <matei@databricks.com>	2015-09-14 21:47:40 -0400
commit	1a0955250bb65cd6f5818ad60efb62ea4b45d18e (patch)
tree	91afe7fda6170e8e2c85563b5f336418c14ae6cc /python/pyspark/streaming/kafka.py
parent	7b6c856367b9c36348e80e83959150da9656c4dd (diff)
download	spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.tar.gz spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.tar.bz2 spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.zip

[SPARK-9851] Support submitting map stages individually in DAGScheduler

This patch adds support for submitting map stages in a DAG individually so that we can make downstream decisions after seeing statistics about their output, as part of SPARK-9850. I also added more comments to many of the key classes in DAGScheduler. By itself, the patch is not super useful except maybe to switch between a shuffle and broadcast join, but with the other subtasks of SPARK-9850 we'll be able to do more interesting decisions. The main entry point is SparkContext.submitMapStage, which lets you run a map stage and see stats about the map output sizes. Other stats could also be collected through accumulators. See AdaptiveSchedulingSuite for a short example. Author: Matei Zaharia <matei@databricks.com> Closes #8180 from mateiz/spark-9851.

Diffstat (limited to 'python/pyspark/streaming/kafka.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: