aboutsummaryrefslogtreecommitdiff
path: root/examples
diff options
context:
space:
mode:
authorEric Liang <ekl@databricks.com>2016-06-09 18:05:16 -0700
committerJosh Rosen <joshrosen@databricks.com>2016-06-09 18:05:16 -0700
commitb914e1930fd5c5f2808f92d4958ec6fbeddf2e30 (patch)
tree4503a4aeaf068a7e8e5565f8a9a3e1bd7732cb23 /examples
parent83070cd1d459101e1189f3b07ea59e22f98e84ce (diff)
downloadspark-b914e1930fd5c5f2808f92d4958ec6fbeddf2e30.tar.gz
spark-b914e1930fd5c5f2808f92d4958ec6fbeddf2e30.tar.bz2
spark-b914e1930fd5c5f2808f92d4958ec6fbeddf2e30.zip
[SPARK-15794] Should truncate toString() of very wide plans
## What changes were proposed in this pull request? With very wide tables, e.g. thousands of fields, the plan output is unreadable and often causes OOMs due to inefficient string processing. This truncates all struct and operator field lists to a user configurable threshold to limit performance impact. It would also be nice to optimize string generation to avoid these sort of O(n^2) slowdowns entirely (i.e. use StringBuilder everywhere including expressions), but this is probably too large of a change for 2.0 at this point, and truncation has other benefits for usability. ## How was this patch tested? Added a microbenchmark that covers this case particularly well. I also ran the microbenchmark while varying the truncation threshold. ``` numFields = 5 wide shallowly nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 2000 wide x 50 rows (write in-mem) 2336 / 2558 0.0 23364.4 0.1X numFields = 25 wide shallowly nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 2000 wide x 50 rows (write in-mem) 4237 / 4465 0.0 42367.9 0.1X numFields = 100 wide shallowly nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 2000 wide x 50 rows (write in-mem) 10458 / 11223 0.0 104582.0 0.0X numFields = Infinity wide shallowly nested struct field r/w: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ [info] java.lang.OutOfMemoryError: Java heap space ``` Author: Eric Liang <ekl@databricks.com> Author: Eric Liang <ekhliang@gmail.com> Closes #13537 from ericl/truncated-string.
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions