diff options
author | Davies Liu <davies@databricks.com> | 2014-11-02 00:03:51 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2014-11-02 00:03:51 -0700 |
commit | 6181577e9935f46b646ba3925b873d031aa3d6ba (patch) | |
tree | 84a704bd54be30393f71e744351e2e74903a311c /network | |
parent | 23f966f47523f85ba440b4080eee665271f53b5e (diff) | |
download | spark-6181577e9935f46b646ba3925b873d031aa3d6ba.tar.gz spark-6181577e9935f46b646ba3925b873d031aa3d6ba.tar.bz2 spark-6181577e9935f46b646ba3925b873d031aa3d6ba.zip |
[SPARK-3466] Limit size of results that a driver collects for each action
Right now, operations like collect() and take() can crash the driver with an OOM if they bring back too many data.
This PR will introduce spark.driver.maxResultSize, after setting it, the driver will abort a job if its result is bigger than it.
By default, it's 1g (for backward compatibility for most the cases).
In local mode, the driver and executor share the same JVM, the default setting can not protect JVM from OOM.
cc mateiz
Author: Davies Liu <davies@databricks.com>
Closes #3003 from davies/collect and squashes the following commits:
248ed5e [Davies Liu] fix compile
272522e [Davies Liu] address comments
2c35773 [Davies Liu] add sizes in message of abort()
5d62303 [Davies Liu] address comments
bc3c077 [Davies Liu] Merge branch 'master' of github.com:apache/spark into collect
11f97c5 [Davies Liu] address comments
47b144f [Davies Liu] check the size of result before send and fetch
3d81af2 [Davies Liu] address comments
ca8267d [Davies Liu] limit the size of data by collect
Diffstat (limited to 'network')
0 files changed, 0 insertions, 0 deletions