diff options
author | Josh Rosen <joshrosen@apache.org> | 2014-08-01 19:38:21 -0700 |
---|---|---|
committer | Aaron Davidson <aaron@databricks.com> | 2014-08-01 19:38:21 -0700 |
commit | e8e0fd691a06a2887fdcffb2217b96805ace0cb0 (patch) | |
tree | e75662f9f8cfd5cb616b7f96482162811c8c9816 /examples | |
parent | a38d3c9efcc0386b52ac4f041920985ae7300e28 (diff) | |
download | spark-e8e0fd691a06a2887fdcffb2217b96805ace0cb0.tar.gz spark-e8e0fd691a06a2887fdcffb2217b96805ace0cb0.tar.bz2 spark-e8e0fd691a06a2887fdcffb2217b96805ace0cb0.zip |
[SPARK-2764] Simplify daemon.py process structure
Curently, daemon.py forks a pool of numProcessors subprocesses, and those processes fork themselves again to create the actual Python worker processes that handle data.
I think that this extra layer of indirection is unnecessary and adds a lot of complexity. This commit attempts to remove this middle layer of subprocesses by launching the workers directly from daemon.py.
See https://github.com/mesos/spark/pull/563 for the original PR that added daemon.py, where I raise some issues with the current design.
Author: Josh Rosen <joshrosen@apache.org>
Closes #1680 from JoshRosen/pyspark-daemon and squashes the following commits:
5abbcb9 [Josh Rosen] Replace magic number: 4 -> EINTR
5495dff [Josh Rosen] Throw IllegalStateException if worker launch fails.
b79254d [Josh Rosen] Detect failed fork() calls; improve error logging.
282c2c4 [Josh Rosen] Remove daemon.py exit logging, since it caused problems:
8554536 [Josh Rosen] Fix daemon’s shutdown(); log shutdown reason.
4e0fab8 [Josh Rosen] Remove shared-memory exit_flag; don't die on worker death.
e9892b4 [Josh Rosen] [WIP] [SPARK-2764] Simplify daemon.py process structure.
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions