aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorZhan Zhang <zhanzhang@fb.com>2016-10-15 18:45:04 -0700
committerMridul Muralidharan <mmuralidharan@HW11853.local>2016-10-15 18:45:04 -0700
commited1463341455830b8867b721a1b34f291139baf3 (patch)
tree02752c4bae9e4b1694b96370a8025cf28052832d /docs
parent36d81c2c68ef4114592b069287743eb5cb078318 (diff)
downloadspark-ed1463341455830b8867b721a1b34f291139baf3.tar.gz
spark-ed1463341455830b8867b721a1b34f291139baf3.tar.bz2
spark-ed1463341455830b8867b721a1b34f291139baf3.zip
[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across executors
## What changes were proposed in this pull request? Restructure the code and implement two new task assigner. PackedAssigner: try to allocate tasks to the executors with least available cores, so that spark can release reserved executors when dynamic allocation is enabled. BalancedAssigner: try to allocate tasks to the executors with more available cores in order to balance the workload across all executors. By default, the original round robin assigner is used. We test a pipeline, and new PackedAssigner save around 45% regarding the reserved cpu and memory with dynamic allocation enabled. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Both unit test in TaskSchedulerImplSuite and manual tests in production pipeline. Author: Zhan Zhang <zhanzhang@fb.com> Closes #15218 from zhzhan/packed-scheduler.
Diffstat (limited to 'docs')
-rw-r--r--docs/configuration.md11
1 files changed, 11 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 373e22d71a..6f3fbeb76c 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1334,6 +1334,17 @@ Apart from these, the following properties are also available, and may be useful
Should be greater than or equal to 1. Number of allowed retries = this value - 1.
</td>
</tr>
+<tr>
+ <td><code>spark.task.assigner</code></td>
+ <td>org.apache.spark.scheduler.RoundRobinAssigner</td>
+ <td>
+ The strategy of how to allocate tasks among workers with free cores.
+ By default, round robin with randomness is used.
+ org.apache.spark.scheduler.BalancedAssigner tries to balance the task across all workers (allocating tasks to
+ workers with most free cores). org.apache.spark.scheduler.PackedAssigner tries to allocate tasks to workers
+ with the least free cores, which may help releasing the resources when dynamic allocation is enabled.
+ </td>
+</tr>
</table>
#### Dynamic Allocation