aboutsummaryrefslogtreecommitdiff
path: root/sql/catalyst
diff options
context:
space:
mode:
authorTejas Patil <tejasp@fb.com>2016-08-28 19:14:58 +0200
committerHerman van Hovell <hvanhovell@databricks.com>2016-08-28 19:14:58 +0200
commit095862a3cff73fd88db9ed37a63e7629e664ff64 (patch)
treebf2ce9b2b93a6e8fd459a01f787402975540c7ca /sql/catalyst
parente07baf14120bc94b783649dabf5fffea58bff0de (diff)
downloadspark-095862a3cff73fd88db9ed37a63e7629e664ff64.tar.gz
spark-095862a3cff73fd88db9ed37a63e7629e664ff64.tar.bz2
spark-095862a3cff73fd88db9ed37a63e7629e664ff64.zip
[SPARK-17271][SQL] Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-17271 Planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253 `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects. eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")` Expression in required SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId, qualifier = Some("a") ) ``` Expression in child SortOrder: ``` AttributeReference( name = "col1", dataType = LongType, nullable = false ) (exprId = exprId) ``` Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order. This PR includes following changes: - Added a `semanticEquals` method to `SortOrder` so that it can compare underlying child expressions semantically (and not using default Object.equals) - Fixed `EnsureRequirements` to use semantic comparison of SortOrder ## How was this patch tested? - Added a test case to `PlannerSuite`. Ran rest tests in `PlannerSuite` Author: Tejas Patil <tejasp@fb.com> Closes #14841 from tejasapatil/SPARK-17271_sort_order_equals_bug.
Diffstat (limited to 'sql/catalyst')
-rw-r--r--sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala3
1 files changed, 3 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
index de779ed370..f498f35792 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
@@ -61,6 +61,9 @@ case class SortOrder(child: Expression, direction: SortDirection)
override def sql: String = child.sql + " " + direction.sql
def isAscending: Boolean = direction == Ascending
+
+ def semanticEquals(other: SortOrder): Boolean =
+ (direction == other.direction) && child.semanticEquals(other.child)
}
/**