diff options
author | mcheah <mcheah@palantir.com> | 2015-02-19 18:09:22 -0800 |
---|---|---|
committer | Andrew Or <andrew@databricks.com> | 2015-02-19 18:09:26 -0800 |
commit | 0382dcc0a94f8e619fd11ec2cc0b18459a690c2b (patch) | |
tree | bafcb45e826fd6acc0d35a958565aaac3612df88 /docs/mllib-migration-guides.md | |
parent | ba941ceb1f78b28ca5cfb18c770f4171b9c74b0a (diff) | |
download | spark-0382dcc0a94f8e619fd11ec2cc0b18459a690c2b.tar.gz spark-0382dcc0a94f8e619fd11ec2cc0b18459a690c2b.tar.bz2 spark-0382dcc0a94f8e619fd11ec2cc0b18459a690c2b.zip |
[SPARK-4808] Removing minimum number of elements read before spill check
In the general case, Spillable's heuristic of checking for memory stress
on every 32nd item after 1000 items are read is good enough. In general,
we do not want to be enacting the spilling checks until later on in the
job; checking for disk-spilling too early can produce unacceptable
performance impact in trivial cases.
However, there are non-trivial cases, particularly if each serialized
object is large, where checking for the necessity to spill too late
would allow the memory to overflow. Consider if every item is 1.5 MB in
size, and the heap size is 1000 MB. Then clearly if we only try to spill
the in-memory contents to disk after 1000 items are read, we would have
already accumulated 1500 MB of RAM and overflowed the heap.
Patch #3656 attempted to circumvent this by checking the need to spill
on every single item read, but that would cause unacceptable performance
in the general case. However, the convoluted cases above should not be
forced to be refactored to shrink the data items. Therefore it makes
sense that the memory spilling thresholds be configurable.
Author: mcheah <mcheah@palantir.com>
Closes #4420 from mingyukim/memory-spill-configurable and squashes the following commits:
6e2509f [mcheah] [SPARK-4808] Removing minimum number of elements read before spill check
(cherry picked from commit 3be92cdac30cf488e09dbdaaa70e5c4cdaa9a099)
Signed-off-by: Andrew Or <andrew@databricks.com>
Diffstat (limited to 'docs/mllib-migration-guides.md')
0 files changed, 0 insertions, 0 deletions