Merge pull request #159 from liancheng/dagscheduler-actor-refine - spark

diff options

author	Matei Zaharia <matei@eecs.berkeley.edu>	2013-11-13 16:49:55 -0800
committer	Matei Zaharia <matei@eecs.berkeley.edu>	2013-11-13 16:49:55 -0800
commit	2054c61a18c277c00661b89bbae365470c297031 (patch)
tree	c3dab6a3cfad9a71e290697f45067fc8f63b3649 /project/SparkBuild.scala
parent	9290e5bcd2c8e4d8bbf1d0ce1ac09bbf62ece4e0 (diff)
parent	e2a43b3dcce81fc99098510d09095e1be4bf3e29 (diff)
download	spark-2054c61a18c277c00661b89bbae365470c297031.tar.gz spark-2054c61a18c277c00661b89bbae365470c297031.tar.bz2 spark-2054c61a18c277c00661b89bbae365470c297031.zip

Merge pull request #159 from liancheng/dagscheduler-actor-refine

Migrate the daemon thread started by DAGScheduler to Akka actor `DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`. This is a classical actor use case. By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread). But things become a little complicated when taking existing test code into consideration. Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`. To minimize code change, I chose to let the actor to delegate messages to `processEvent`. Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct. Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null. But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`. What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?). I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold. That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896). At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.

Diffstat (limited to 'project/SparkBuild.scala')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: