diff options
author | Josh Rosen <joshrosen@databricks.com> | 2016-06-06 11:44:51 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2016-06-06 11:44:51 -0700 |
commit | 0b8d694999b43ada4833388cad6c285c7757cbf7 (patch) | |
tree | 136fd92aeee3b91daf1c0c4423b0bcba11c6659b /core | |
parent | 4c74ee8d8e1c3139d3d322ae68977f2ab53295df (diff) | |
download | spark-0b8d694999b43ada4833388cad6c285c7757cbf7.tar.gz spark-0b8d694999b43ada4833388cad6c285c7757cbf7.tar.bz2 spark-0b8d694999b43ada4833388cad6c285c7757cbf7.zip |
[SPARK-15764][SQL] Replace N^2 loop in BindReferences
BindReferences contains a n^2 loop which causes performance issues when operating over large schemas: to determine the ordinal of an attribute reference, we perform a linear scan over the `input` array. Because input can sometimes be a `List`, the call to `input(ordinal).nullable` can also be O(n).
Instead of performing a linear scan, we can convert the input into an array and build a hash map to map from expression ids to ordinals. The greater up-front cost of the map construction is offset by the fact that an expression can contain multiple attribute references, so the cost of the map construction is amortized across a number of lookups.
Perf. benchmarks to follow. /cc ericl
Author: Josh Rosen <joshrosen@databricks.com>
Closes #13505 from JoshRosen/bind-references-improvement.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions