aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorSameer Agarwal <sameerag@cs.berkeley.edu>2016-07-28 13:04:19 -0700
committerReynold Xin <rxin@databricks.com>2016-07-28 13:04:19 -0700
commit3fd39b87bda77f3c3a4622d854f23d4234683571 (patch)
treeb8cd04695535ac24e1e644e02d8e5fafa687273a /python
parent1178d61ede816bf1c8d5bb3dbb3b965c9b944407 (diff)
downloadspark-3fd39b87bda77f3c3a4622d854f23d4234683571.tar.gz
spark-3fd39b87bda77f3c3a4622d854f23d4234683571.tar.bz2
spark-3fd39b87bda77f3c3a4622d854f23d4234683571.zip
[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on OutOfMemoryError
## What changes were proposed in this pull request? We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. As a short term fix, this patch intercepts the OutOfMemoryError exception and suggest the user to disable the vectorized parquet reader. ## How was this patch tested? Existing Tests Author: Sameer Agarwal <sameerag@cs.berkeley.edu> Closes #14387 from sameeragarwal/oom.
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions