[SPARK-2696] Reduce default value of spark.serializer.objectStreamReset - spark

diff options

author	Hossein <hossein@databricks.com>	2014-07-26 01:04:56 -0700
committer	Matei Zaharia <matei@databricks.com>	2014-07-26 01:04:56 -0700
commit	66f26a4610aede57322cb7e193a50aecb6c57d22 (patch)
tree	e45cb6dbf2a6970f0b8b341a0384352b2106122d /python/pyspark/shuffle.py
parent	cf3e9fd84dc64f8a57ecbcfdd6b22f5492d41bd7 (diff)
download	spark-66f26a4610aede57322cb7e193a50aecb6c57d22.tar.gz spark-66f26a4610aede57322cb7e193a50aecb6c57d22.tar.bz2 spark-66f26a4610aede57322cb7e193a50aecb6c57d22.zip

[SPARK-2696] Reduce default value of spark.serializer.objectStreamReset

The current default value of spark.serializer.objectStreamReset is 10,000. When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 ~= 640 GB which will cause out of memory errors. This patch sets the default value to a more reasonable default value (100). Author: Hossein <hossein@databricks.com> Closes #1595 from falaki/objectStreamReset and squashes the following commits: 650a935 [Hossein] Updated documentation 1aa0df8 [Hossein] Reduce default value of spark.serializer.objectStreamReset

Diffstat (limited to 'python/pyspark/shuffle.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: