aboutsummaryrefslogtreecommitdiff
path: root/docs/sql-programming-guide.md
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-02-01 11:57:13 -0800
committerMichael Armbrust <michael@databricks.com>2016-02-01 11:57:13 -0800
commit8f26eb5ef6853a6666d7d9481b333de70bc501ed (patch)
tree887eb1d86baf1e8d7fbef56e31e91ff0d253d1f0 /docs/sql-programming-guide.md
parent33c8a490f7f64320c53530a57bd8d34916e3607c (diff)
downloadspark-8f26eb5ef6853a6666d7d9481b333de70bc501ed.tar.gz
spark-8f26eb5ef6853a6666d7d9481b333de70bc501ed.tar.bz2
spark-8f26eb5ef6853a6666d7d9481b333de70bc501ed.zip
[SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences
JIRA: https://issues.apache.org/jira/browse/SPARK-12705 **Scope:** This PR is a general fix for sorting reference resolution when the child's `outputSet` does not have the order-by attributes (called, *missing attributes*): - UnaryNode support is limited to `Project`, `Window`, `Aggregate`, `Distinct`, `Filter`, `RepartitionByExpression`. - We will not try to resolve the missing references inside a subquery, unless the outputSet of this subquery contains it. **General Reference Resolution Rules:** - Jump over the nodes with the following types: `Distinct`, `Filter`, `RepartitionByExpression`. Do not need to add missing attributes. The reason is their `outputSet` is decided by their `inputSet`, which is the `outputSet` of their children. - Group-by expressions in `Aggregate`: missing order-by attributes are not allowed to be added into group-by expressions since it will change the query result. Thus, in RDBMS, it is not allowed. - Aggregate expressions in `Aggregate`: if the group-by expressions in `Aggregate` contains the missing attributes but aggregate expressions do not have it, just add them into the aggregate expressions. This can resolve the analysisExceptions thrown by the three TCPDS queries. - `Project` and `Window` are special. We just need to add the missing attributes to their `projectList`. **Implementation:** 1. Traverse the whole tree in a pre-order manner to find all the resolvable missing order-by attributes. 2. Traverse the whole tree in a post-order manner to add the found missing order-by attributes to the node if their `inputSet` contains the attributes. 3. If the origins of the missing order-by attributes are different nodes, each pass only resolves the missing attributes that are from the same node. **Risk:** Low. This rule will be trigger iff ```!s.resolved && child.resolved``` is true. Thus, very few cases are affected. Author: gatorsmile <gatorsmile@gmail.com> Closes #10678 from gatorsmile/sortWindows.
Diffstat (limited to 'docs/sql-programming-guide.md')
0 files changed, 0 insertions, 0 deletions