[SPARK-3886] [PySpark] use AutoBatchedSerializer by default - spark

diff options

author	Davies Liu <davies.liu@gmail.com>	2014-10-10 14:14:05 -0700
committer	Josh Rosen <joshrosen@apache.org>	2014-10-10 14:14:05 -0700
commit	72f36ee571ad27c7c7c70bb9aecc7e6ef51dfd44 (patch)
tree	091ca732b2b48875c478e416807e28a23f0916d7 /pom.xml
parent	90f73fcc47c7bf881f808653d46a9936f37c3c31 (diff)
download	spark-72f36ee571ad27c7c7c70bb9aecc7e6ef51dfd44.tar.gz spark-72f36ee571ad27c7c7c70bb9aecc7e6ef51dfd44.tar.bz2 spark-72f36ee571ad27c7c7c70bb9aecc7e6ef51dfd44.zip

[SPARK-3886] [PySpark] use AutoBatchedSerializer by default

Use AutoBatchedSerializer by default, which will choose the proper batch size based on size of serialized objects, let the size of serialized batch fall in into [64k - 640k]. In JVM, the serializer will also track the objects in batch to figure out duplicated objects, larger batch may cause OOM in JVM. Author: Davies Liu <davies.liu@gmail.com> Closes #2740 from davies/batchsize and squashes the following commits: 52cdb88 [Davies Liu] update docs 185f2b9 [Davies Liu] use AutoBatchedSerializer by default

Diffstat (limited to 'pom.xml')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: