aboutsummaryrefslogtreecommitdiff
path: root/tox.ini
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2014-08-26 16:29:14 -0700
committerReynold Xin <rxin@apache.org>2014-08-26 16:29:14 -0700
commitc4787a3690a9ed3b8b2c6c294fc4a6915436b6f7 (patch)
tree15b185728ed6e46fd93f795780a6266fc42ffd76 /tox.ini
parent1208f72ac78960fe5060187761479b2a9a417c1b (diff)
downloadspark-c4787a3690a9ed3b8b2c6c294fc4a6915436b6f7.tar.gz
spark-c4787a3690a9ed3b8b2c6c294fc4a6915436b6f7.tar.bz2
spark-c4787a3690a9ed3b8b2c6c294fc4a6915436b6f7.zip
[SPARK-3194][SQL] Add AttributeSet to fix bugs with invalid comparisons of AttributeReferences
It is common to want to describe sets of attributes that are in various parts of a query plan. However, the semantics of putting `AttributeReference` objects into a standard Scala `Set` result in subtle bugs when references differ cosmetically. For example, with case insensitive resolution it is possible to have two references to the same attribute whose names are not equal. In this PR I introduce a new abstraction, an `AttributeSet`, which performs all comparisons using the globally unique `ExpressionId` instead of case class equality. (There is already a related class, [`AttributeMap`](https://github.com/marmbrus/spark/blob/inMemStats/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala#L32)) This new type of set is used to fix a bug in the optimizer where needed attributes were getting projected away underneath join operators. I also took this opportunity to refactor the expression and query plan base classes. In all but one instance the logic for computing the `references` of an `Expression` were the same. Thus, I moved this logic into the base class. For query plans the semantics of the `references` method were ill defined (is it the references output? or is it those used by expression evaluation? or what?). As a result, this method wasn't really used very much. So, I removed it. TODO: - [x] Finish scala doc for `AttributeSet` - [x] Scan the code for other instances of `Set[Attribute]` and refactor them. - [x] Finish removing `references` from `QueryPlan` Author: Michael Armbrust <michael@databricks.com> Closes #2109 from marmbrus/attributeSets and squashes the following commits: 1c0dae5 [Michael Armbrust] work on serialization bug. 9ba868d [Michael Armbrust] Merge remote-tracking branch 'origin/master' into attributeSets 3ae5288 [Michael Armbrust] review comments 40ce7f6 [Michael Armbrust] style d577cc7 [Michael Armbrust] Scaladoc cae5d22 [Michael Armbrust] remove more references implementations d6e16be [Michael Armbrust] Remove more instances of "def references" and normal sets of attributes. fc26b49 [Michael Armbrust] Add AttributeSet class, remove references from Expression.
Diffstat (limited to 'tox.ini')
0 files changed, 0 insertions, 0 deletions