[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across executors

## What changes were proposed in this pull request? Restructure the code and implement two new task assigner. PackedAssigner: try to allocate tasks to the executors with least available cores, so that spark can release reserved executors when dynamic allocation is enabled. BalancedAssigner: try to allocate tasks to the executors with more available cores in order to balance the workload across all executors. By default, the original round robin assigner is used. We test a pipeline, and new PackedAssigner save around 45% regarding the reserved cpu and memory with dynamic allocation enabled. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Both unit test in TaskSchedulerImplSuite and manual tests in production pipeline. Author: Zhan Zhang <zhanzhang@fb.com> Closes #15218 from zhzhan/packed-scheduler.
author: Zhan Zhang <zhanzhang@fb.com> 2016-10-15 18:45:04 -0700
committer: Mridul Muralidharan <mmuralidharan@HW11853.local> 2016-10-15 18:45:04 -0700
commit: ed1463341455830b8867b721a1b34f291139baf3 (patch)
tree: 02752c4bae9e4b1694b96370a8025cf28052832d /docs
parent: 36d81c2c68ef4114592b069287743eb5cb078318 (diff)
download: spark-ed1463341455830b8867b721a1b34f291139baf3.tar.gz
spark-ed1463341455830b8867b721a1b34f291139baf3.tar.bz2
spark-ed1463341455830b8867b721a1b34f291139baf3.zip
1 files changed, 11 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 373e22d71a..6f3fbeb76c 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1334,6 +1334,17 @@ Apart from these, the following properties are also available, and may be useful
     Should be greater than or equal to 1. Number of allowed retries = this value - 1.
   </td>
 </tr>
+<tr>
+  <td><code>spark.task.assigner</code></td>
+  <td>org.apache.spark.scheduler.RoundRobinAssigner</td>
+  <td>
+    The strategy of how to allocate tasks among workers with free cores.
+    By default, round robin with randomness is used.
+    org.apache.spark.scheduler.BalancedAssigner tries to balance the task across all workers (allocating tasks to
+    workers with most free cores). org.apache.spark.scheduler.PackedAssigner tries to allocate tasks to workers
+    with the least free cores, which may help releasing the resources when dynamic allocation is enabled.
+  </td>
+</tr>
 </table>
 
 #### Dynamic Allocation
author	Zhan Zhang <zhanzhang@fb.com>	2016-10-15 18:45:04 -0700
committer	Mridul Muralidharan <mmuralidharan@HW11853.local>	2016-10-15 18:45:04 -0700
commit	ed1463341455830b8867b721a1b34f291139baf3 (patch)
tree	02752c4bae9e4b1694b96370a8025cf28052832d /docs
parent	36d81c2c68ef4114592b069287743eb5cb078318 (diff)
download	spark-ed1463341455830b8867b721a1b34f291139baf3.tar.gz spark-ed1463341455830b8867b721a1b34f291139baf3.tar.bz2 spark-ed1463341455830b8867b721a1b34f291139baf3.zip