[SPARK-1912] fix compress memory issue during reduce - spark

diff options

author	Wenchen Fan(Cloud) <cloud0fan@gmail.com>	2014-06-03 13:18:20 -0700
committer	Matei Zaharia <matei@databricks.com>	2014-06-03 13:18:20 -0700
commit	45e9bc85db231e84a23b8d757136023eabcec13e (patch)
tree	e802fd86c7331d30048d9e701f7f46bea29a8485 /pom.xml
parent	6c044ed100081f1fa7c8df4bd4b58b65a61c3360 (diff)
download	spark-45e9bc85db231e84a23b8d757136023eabcec13e.tar.gz spark-45e9bc85db231e84a23b8d757136023eabcec13e.tar.bz2 spark-45e9bc85db231e84a23b8d757136023eabcec13e.zip

[SPARK-1912] fix compress memory issue during reduce

When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so we can do the compression instance initialization lazily. Author: Wenchen Fan(Cloud) <cloud0fan@gmail.com> Closes #860 from cloud-fan/fix-compress and squashes the following commits: 0924a6b [Wenchen Fan(Cloud)] rename 'doWork' into 'getIterator' 07f32c2 [Wenchen Fan(Cloud)] move the LazyProxyIterator to dataDeserialize d80c426 [Wenchen Fan(Cloud)] remove empty lines in short class 2c8adb2 [Wenchen Fan(Cloud)] add inline comment 8ebff77 [Wenchen Fan(Cloud)] fix compress memory issue during reduce

Diffstat (limited to 'pom.xml')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: