aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2016-06-06 11:44:51 -0700
committerJosh Rosen <joshrosen@databricks.com>2016-06-06 11:44:51 -0700
commit0b8d694999b43ada4833388cad6c285c7757cbf7 (patch)
tree136fd92aeee3b91daf1c0c4423b0bcba11c6659b /core
parent4c74ee8d8e1c3139d3d322ae68977f2ab53295df (diff)
downloadspark-0b8d694999b43ada4833388cad6c285c7757cbf7.tar.gz
spark-0b8d694999b43ada4833388cad6c285c7757cbf7.tar.bz2
spark-0b8d694999b43ada4833388cad6c285c7757cbf7.zip
[SPARK-15764][SQL] Replace N^2 loop in BindReferences
BindReferences contains a n^2 loop which causes performance issues when operating over large schemas: to determine the ordinal of an attribute reference, we perform a linear scan over the `input` array. Because input can sometimes be a `List`, the call to `input(ordinal).nullable` can also be O(n). Instead of performing a linear scan, we can convert the input into an array and build a hash map to map from expression ids to ordinals. The greater up-front cost of the map construction is offset by the fact that an expression can contain multiple attribute references, so the cost of the map construction is amortized across a number of lookups. Perf. benchmarks to follow. /cc ericl Author: Josh Rosen <joshrosen@databricks.com> Closes #13505 from JoshRosen/bind-references-improvement.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions