diff options
author | Takuya UESHIN <ueshin@happy-camper.st> | 2016-06-14 10:52:13 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2016-06-14 10:52:13 -0700 |
commit | c5b735581922c52a1b1cc6cd8c7b5878d3cf8f20 (patch) | |
tree | f9f39e06bec1c65ff416befe9bdd38eb978330ee /sql/hive/src/main | |
parent | bc02d011294fcd1ab07b9baf1011c3f2bdf749d9 (diff) | |
download | spark-c5b735581922c52a1b1cc6cd8c7b5878d3cf8f20.tar.gz spark-c5b735581922c52a1b1cc6cd8c7b5878d3cf8f20.tar.bz2 spark-c5b735581922c52a1b1cc6cd8c7b5878d3cf8f20.zip |
[SPARK-15915][SQL] Logical plans should use canonicalized plan when override sameResult.
## What changes were proposed in this pull request?
`DataFrame` with plan overriding `sameResult` but not using canonicalized plan to compare can't cacheTable.
The example is like:
```
val localRelation = Seq(1, 2, 3).toDF()
localRelation.createOrReplaceTempView("localRelation")
spark.catalog.cacheTable("localRelation")
assert(
localRelation.queryExecution.withCachedData.collect {
case i: InMemoryRelation => i
}.size == 1)
```
and this will fail as:
```
ArrayBuffer() had size 0 instead of expected size 1
```
The reason is that when do `spark.catalog.cacheTable("localRelation")`, `CacheManager` tries to cache for the plan wrapped by `SubqueryAlias` but when planning for the DataFrame `localRelation`, `CacheManager` tries to find cached table for the not-wrapped plan because the plan for DataFrame `localRelation` is not wrapped.
Some plans like `LocalRelation`, `LogicalRDD`, etc. override `sameResult` method, but not use canonicalized plan to compare so the `CacheManager` can't detect the plans are the same.
This pr modifies them to use canonicalized plan when override `sameResult` method.
## How was this patch tested?
Added a test to check if DataFrame with plan overriding sameResult but not using canonicalized plan to compare can cacheTable.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #13638 from ueshin/issues/SPARK-15915.
Diffstat (limited to 'sql/hive/src/main')
-rw-r--r-- | sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala index 5596a4470f..58bca2059c 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala @@ -185,7 +185,7 @@ private[hive] case class MetastoreRelation( /** Only compare database and tablename, not alias. */ override def sameResult(plan: LogicalPlan): Boolean = { - plan match { + plan.canonicalized match { case mr: MetastoreRelation => mr.databaseName == databaseName && mr.tableName == tableName case _ => false |