diff options
author | Matei Zaharia <matei@databricks.com> | 2015-09-14 21:47:40 -0400 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2015-09-14 21:47:40 -0400 |
commit | 1a0955250bb65cd6f5818ad60efb62ea4b45d18e (patch) | |
tree | 91afe7fda6170e8e2c85563b5f336418c14ae6cc /python/pyspark/streaming/kafka.py | |
parent | 7b6c856367b9c36348e80e83959150da9656c4dd (diff) | |
download | spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.tar.gz spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.tar.bz2 spark-1a0955250bb65cd6f5818ad60efb62ea4b45d18e.zip |
[SPARK-9851] Support submitting map stages individually in DAGScheduler
This patch adds support for submitting map stages in a DAG individually so that we can make downstream decisions after seeing statistics about their output, as part of SPARK-9850. I also added more comments to many of the key classes in DAGScheduler. By itself, the patch is not super useful except maybe to switch between a shuffle and broadcast join, but with the other subtasks of SPARK-9850 we'll be able to do more interesting decisions.
The main entry point is SparkContext.submitMapStage, which lets you run a map stage and see stats about the map output sizes. Other stats could also be collected through accumulators. See AdaptiveSchedulingSuite for a short example.
Author: Matei Zaharia <matei@databricks.com>
Closes #8180 from mateiz/spark-9851.
Diffstat (limited to 'python/pyspark/streaming/kafka.py')
0 files changed, 0 insertions, 0 deletions