[SPARK-13980] Incrementally serialize blocks while unrolling them in MemoryStore - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2016-03-24 17:33:21 -0700
committer	Josh Rosen <joshrosen@databricks.com>	2016-03-24 17:33:21 -0700
commit	fdd460f5f47e4023d81d5a3d918bd4a16ecbb580 (patch)
tree	0b54ced5251783827a533860160b28e1c12fa251 /R/pkg
parent	2cf46d5a96897d5f97b364db357d30566183c6e7 (diff)
download	spark-fdd460f5f47e4023d81d5a3d918bd4a16ecbb580.tar.gz spark-fdd460f5f47e4023d81d5a3d918bd4a16ecbb580.tar.bz2 spark-fdd460f5f47e4023d81d5a3d918bd4a16ecbb580.zip

[SPARK-13980] Incrementally serialize blocks while unrolling them in MemoryStore

When a block is persisted in the MemoryStore at a serialized storage level, the current MemoryStore.putIterator() code will unroll the entire iterator as Java objects in memory, then will turn around and serialize an iterator obtained from the unrolled array. This is inefficient and doubles our peak memory requirements. Instead, I think that we should incrementally serialize blocks while unrolling them. A downside to incremental serialization is the fact that we will need to deserialize the partially-unrolled data in case there is not enough space to unroll the block and the block cannot be dropped to disk. However, I'm hoping that the memory efficiency improvements will outweigh any performance losses as a result of extra serialization in that hopefully-rare case. Author: Josh Rosen <joshrosen@databricks.com> Closes #11791 from JoshRosen/serialize-incrementally.

Diffstat (limited to 'R/pkg')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: