[SPARK-11135] [SQL] Exchange incorrectly skips sorts when existing ordering is non-empty subset of required ordering - spark

diff options

author	Josh Rosen <joshrosen@databricks.com>	2015-10-15 17:36:55 -0700
committer	Michael Armbrust <michael@databricks.com>	2015-10-15 17:36:55 -0700
commit	eb0b4d6e2ddfb765f082d0d88472626336ad2609 (patch)
tree	18f542a940c01c8d924f48cbfec89244c5ab01ad /streaming
parent	6a2359ff1f7ad2233af2c530313d6ec2ecf70d19 (diff)
download	spark-eb0b4d6e2ddfb765f082d0d88472626336ad2609.tar.gz spark-eb0b4d6e2ddfb765f082d0d88472626336ad2609.tar.bz2 spark-eb0b4d6e2ddfb765f082d0d88472626336ad2609.zip

[SPARK-11135] [SQL] Exchange incorrectly skips sorts when existing ordering is non-empty subset of required ordering

In Spark SQL, the Exchange planner tries to avoid unnecessary sorts in cases where the data has already been sorted by a superset of the requested sorting columns. For instance, let's say that a query calls for an operator's input to be sorted by `a.asc` and the input happens to already be sorted by `[a.asc, b.asc]`. In this case, we do not need to re-sort the input. The converse, however, is not true: if the query calls for `[a.asc, b.asc]`, then `a.asc` alone will not satisfy the ordering requirements, requiring an additional sort to be planned by Exchange. However, the current Exchange code gets this wrong and incorrectly skips sorting when the existing output ordering is a subset of the required ordering. This is simple to fix, however. This bug was introduced in https://github.com/apache/spark/pull/7458, so it affects 1.5.0+. This patch fixes the bug and significantly improves the unit test coverage of Exchange's sort-planning logic. Author: Josh Rosen <joshrosen@databricks.com> Closes #9140 from JoshRosen/SPARK-11135.

Diffstat (limited to 'streaming')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: