[SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of children - spark

diff options

author	Michael Davies <Michael.BellDavies@gmail.com>	2015-06-17 12:56:55 -0700
committer	Michael Armbrust <michael@databricks.com>	2015-06-17 12:56:55 -0700
commit	0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b (patch)
tree	903bf5f70107631f8f32a848c676d1eb6fc10c5a /docs/spark-standalone.md
parent	50a0496a43f09d70593419efc38587c8441843bf (diff)
download	spark-0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b.tar.gz spark-0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b.tar.bz2 spark-0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b.zip

[SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of children

For example large IN clauses Large IN clauses are parsed very slowly. For example SQL below (10K items in IN) takes 45-50s. s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 10000).map("n" + _).mkString("','")}')""" This is principally due to TreeNode which repeatedly call contains on children, where children in this case is a List that is 10K long. In effect parsing for large IN clauses is O(N squared). A lazily initialised Set based on children for contains reduces parse time to around 2.5s Author: Michael Davies <Michael.BellDavies@gmail.com> Closes #6673 from MickDavies/SPARK-8077 and squashes the following commits: 38cd425 [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children d80103b [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children e6be8be [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children

Diffstat (limited to 'docs/spark-standalone.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: