aboutsummaryrefslogtreecommitdiff
path: root/docs/spark-standalone.md
diff options
context:
space:
mode:
authorMichael Davies <Michael.BellDavies@gmail.com>2015-06-17 12:56:55 -0700
committerMichael Armbrust <michael@databricks.com>2015-06-17 12:56:55 -0700
commit0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b (patch)
tree903bf5f70107631f8f32a848c676d1eb6fc10c5a /docs/spark-standalone.md
parent50a0496a43f09d70593419efc38587c8441843bf (diff)
downloadspark-0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b.tar.gz
spark-0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b.tar.bz2
spark-0c1b2df043fde9ac9f28a5f348ee96ce124f2c6b.zip
[SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of children
For example large IN clauses Large IN clauses are parsed very slowly. For example SQL below (10K items in IN) takes 45-50s. s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 10000).map("n" + _).mkString("','")}')""" This is principally due to TreeNode which repeatedly call contains on children, where children in this case is a List that is 10K long. In effect parsing for large IN clauses is O(N squared). A lazily initialised Set based on children for contains reduces parse time to around 2.5s Author: Michael Davies <Michael.BellDavies@gmail.com> Closes #6673 from MickDavies/SPARK-8077 and squashes the following commits: 38cd425 [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children d80103b [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children e6be8be [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children
Diffstat (limited to 'docs/spark-standalone.md')
0 files changed, 0 insertions, 0 deletions