aboutsummaryrefslogtreecommitdiff
path: root/sql/catalyst
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2016-03-12 00:48:36 -0800
committerDavies Liu <davies.liu@gmail.com>2016-03-12 00:48:36 -0800
commitba8c86d06f5968c1af4db8dd9a458005bc5f214c (patch)
treefa6a7479cef0ba8c2f6b4574b0bbd180502bed85 /sql/catalyst
parent2ef4c5963bff3574fe17e669d703b25ddd064e5d (diff)
downloadspark-ba8c86d06f5968c1af4db8dd9a458005bc5f214c.tar.gz
spark-ba8c86d06f5968c1af4db8dd9a458005bc5f214c.tar.bz2
spark-ba8c86d06f5968c1af4db8dd9a458005bc5f214c.zip
[SPARK-13671] [SPARK-13311] [SQL] Use different physical plans for RDD and data sources
## What changes were proposed in this pull request? This PR split the PhysicalRDD into two classes, PhysicalRDD and PhysicalScan. PhysicalRDD is used for DataFrames that is created from existing RDD. PhysicalScan is used for DataFrame that is created from data sources. This enable use to apply different optimization on both of them. Also fix the problem for sameResult() on two DataSourceScan. Also fix the equality check to toString for `In`. It's better to use Seq there, but we can't break this public API (sad). ## How was this patch tested? Existing tests. Manually tested with TPCDS query Q59 and Q64, all those duplicated exchanges can be re-used now, also saw there are 40+% performance improvement (saving half of the scan). Author: Davies Liu <davies@databricks.com> Closes #11514 from davies/existing_rdd.
Diffstat (limited to 'sql/catalyst')
-rw-r--r--sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala12
1 files changed, 6 insertions, 6 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
index c222571a34..920e989d05 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
@@ -280,12 +280,12 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
* can do better should override this function.
*/
def sameResult(plan: PlanType): Boolean = {
- val canonicalizedLeft = this.canonicalized
- val canonicalizedRight = plan.canonicalized
- canonicalizedLeft.getClass == canonicalizedRight.getClass &&
- canonicalizedLeft.children.size == canonicalizedRight.children.size &&
- canonicalizedLeft.cleanArgs == canonicalizedRight.cleanArgs &&
- (canonicalizedLeft.children, canonicalizedRight.children).zipped.forall(_ sameResult _)
+ val left = this.canonicalized
+ val right = plan.canonicalized
+ left.getClass == right.getClass &&
+ left.children.size == right.children.size &&
+ left.cleanArgs == right.cleanArgs &&
+ (left.children, right.children).zipped.forall(_ sameResult _)
}
/**