diff options
author | gatorsmile <gatorsmile@gmail.com> | 2016-02-01 11:57:13 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2016-02-01 11:57:13 -0800 |
commit | 8f26eb5ef6853a6666d7d9481b333de70bc501ed (patch) | |
tree | 887eb1d86baf1e8d7fbef56e31e91ff0d253d1f0 /docs | |
parent | 33c8a490f7f64320c53530a57bd8d34916e3607c (diff) | |
download | spark-8f26eb5ef6853a6666d7d9481b333de70bc501ed.tar.gz spark-8f26eb5ef6853a6666d7d9481b333de70bc501ed.tar.bz2 spark-8f26eb5ef6853a6666d7d9481b333de70bc501ed.zip |
[SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences
JIRA: https://issues.apache.org/jira/browse/SPARK-12705
**Scope:**
This PR is a general fix for sorting reference resolution when the child's `outputSet` does not have the order-by attributes (called, *missing attributes*):
- UnaryNode support is limited to `Project`, `Window`, `Aggregate`, `Distinct`, `Filter`, `RepartitionByExpression`.
- We will not try to resolve the missing references inside a subquery, unless the outputSet of this subquery contains it.
**General Reference Resolution Rules:**
- Jump over the nodes with the following types: `Distinct`, `Filter`, `RepartitionByExpression`. Do not need to add missing attributes. The reason is their `outputSet` is decided by their `inputSet`, which is the `outputSet` of their children.
- Group-by expressions in `Aggregate`: missing order-by attributes are not allowed to be added into group-by expressions since it will change the query result. Thus, in RDBMS, it is not allowed.
- Aggregate expressions in `Aggregate`: if the group-by expressions in `Aggregate` contains the missing attributes but aggregate expressions do not have it, just add them into the aggregate expressions. This can resolve the analysisExceptions thrown by the three TCPDS queries.
- `Project` and `Window` are special. We just need to add the missing attributes to their `projectList`.
**Implementation:**
1. Traverse the whole tree in a pre-order manner to find all the resolvable missing order-by attributes.
2. Traverse the whole tree in a post-order manner to add the found missing order-by attributes to the node if their `inputSet` contains the attributes.
3. If the origins of the missing order-by attributes are different nodes, each pass only resolves the missing attributes that are from the same node.
**Risk:**
Low. This rule will be trigger iff ```!s.resolved && child.resolved``` is true. Thus, very few cases are affected.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10678 from gatorsmile/sortWindows.
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions