[SPARK-17409][SQL] Do Not Optimize Query in CTAS More Than Once

### What changes were proposed in this pull request? As explained in https://github.com/apache/spark/pull/14797: >Some analyzer rules have assumptions on logical plans, optimizer may break these assumption, we should not pass an optimized query plan into QueryExecution (will be analyzed again), otherwise we may some weird bugs. For example, we have a rule for decimal calculation to promote the precision before binary operations, use PromotePrecision as placeholder to indicate that this rule should not apply twice. But a Optimizer rule will remove this placeholder, that break the assumption, then the rule applied twice, cause wrong result. We should not optimize the query in CTAS more than once. For example, ```Scala spark.range(99, 101).createOrReplaceTempView("tab1") val sqlStmt = "SELECT id, cast(id as long) * cast('1.0' as decimal(38, 18)) as num FROM tab1" sql(s"CREATE TABLE tab2 USING PARQUET AS $sqlStmt") checkAnswer(spark.table("tab2"), sql(sqlStmt)) ``` Before this PR, the results do not match ``` == Results == !== Correct Answer - 2 == == Spark Answer - 2 == ![100,100.000000000000000000] [100,null] [99,99.000000000000000000] [99,99.000000000000000000] ``` After this PR, the results match. ``` +---+----------------------+ |id |num | +---+----------------------+ |99 |99.000000000000000000 | |100|100.000000000000000000| +---+----------------------+ ``` In this PR, we do not treat the `query` in CTAS as a child. Thus, the `query` will not be optimized when optimizing CTAS statement. However, we still need to analyze it for normalizing and verifying the CTAS in the Analyzer. Thus, we do it in the analyzer rule `PreprocessDDL`, because so far only this rule needs the analyzed plan of the `query`. ### How was this patch tested? Added a test Author: gatorsmile <gatorsmile@gmail.com> Closes #15048 from gatorsmile/ctasOptimized.
author: gatorsmile <gatorsmile@gmail.com> 2016-09-14 23:10:20 +0800
committer: Wenchen Fan <wenchen@databricks.com> 2016-09-14 23:10:20 +0800
commit: 52738d4e099a19466ef909b77c24cab109548706 (patch)
tree: 600b0753c5129b8f7588dd3dc011ae187081d910 /sql/catalyst/src/test
parent: dc0a4c916151c795dc41b5714e9d23b4937f4636 (diff)
download: spark-52738d4e099a19466ef909b77c24cab109548706.tar.gz
spark-52738d4e099a19466ef909b77c24cab109548706.tar.bz2
spark-52738d4e099a19466ef909b77c24cab109548706.zip
1 files changed, 1 insertions, 4 deletions
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
index 6df47acaba..ff1bb126f4 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
@@ -31,10 +31,7 @@ import org.apache.spark.sql.streaming.OutputMode
 import org.apache.spark.sql.types.IntegerType
 
 /** A dummy command for testing unsupported operations. */
-case class DummyCommand() extends LogicalPlan with Command {
-  override def output: Seq[Attribute] = Nil
-  override def children: Seq[LogicalPlan] = Nil
-}
+case class DummyCommand() extends Command
 
 class UnsupportedOperationsSuite extends SparkFunSuite {
author	gatorsmile <gatorsmile@gmail.com>	2016-09-14 23:10:20 +0800
committer	Wenchen Fan <wenchen@databricks.com>	2016-09-14 23:10:20 +0800
commit	52738d4e099a19466ef909b77c24cab109548706 (patch)
tree	600b0753c5129b8f7588dd3dc011ae187081d910 /sql/catalyst/src/test
parent	dc0a4c916151c795dc41b5714e9d23b4937f4636 (diff)
download	spark-52738d4e099a19466ef909b77c24cab109548706.tar.gz spark-52738d4e099a19466ef909b77c24cab109548706.tar.bz2 spark-52738d4e099a19466ef909b77c24cab109548706.zip