[SPARK-11453][SQL] append data to partitioned table will messes up the result

The reason is that: 1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `<a: Int, b: Int>` partition by `a` will become `<b: Int, a: Int>`) 2. When append data to table, we use position to figure out how to match input columns to table's columns. So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position, like what we did for [`InsertIntoHadoopFsRelation`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L101-L105) Author: Wenchen Fan <wenchen@databricks.com> Closes #9408 from cloud-fan/append.
author: Wenchen Fan <wenchen@databricks.com> 2015-11-08 21:01:53 -0800
committer: Yin Huai <yhuai@databricks.com> 2015-11-08 21:01:53 -0800
commit: d8b50f70298dbf45e91074ee2d751fee7eecb119 (patch)
tree: ad2b1418e3684630bd0ac18349e9c559bbf4782c /sql/hive
parent: 97b7080cf2d2846c7257f8926f775f27d457fe7d (diff)
download: spark-d8b50f70298dbf45e91074ee2d751fee7eecb119.tar.gz
spark-d8b50f70298dbf45e91074ee2d751fee7eecb119.tar.bz2
spark-d8b50f70298dbf45e91074ee2d751fee7eecb119.zip
1 files changed, 20 insertions, 0 deletions
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index af48d47895..9a425d7f6b 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -1428,4 +1428,24 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
       checkAnswer(sql("SELECT val FROM tbl10562 WHERE Year == 2012"), Row("a"))
     }
   }
+
+  test("SPARK-11453: append data to partitioned table") {
+    withTable("tbl11453") {
+      Seq("1" -> "10", "2" -> "20").toDF("i", "j")
+        .write.partitionBy("i").saveAsTable("tbl11453")
+
+      Seq("3" -> "30").toDF("i", "j")
+        .write.mode(SaveMode.Append).partitionBy("i").saveAsTable("tbl11453")
+      checkAnswer(
+        sqlContext.read.table("tbl11453").select("i", "j").orderBy("i"),
+        Row("1", "10") :: Row("2", "20") :: Row("3", "30") :: Nil)
+
+      // make sure case sensitivity is correct.
+      Seq("4" -> "40").toDF("i", "j")
+        .write.mode(SaveMode.Append).partitionBy("I").saveAsTable("tbl11453")
+      checkAnswer(
+        sqlContext.read.table("tbl11453").select("i", "j").orderBy("i"),
+        Row("1", "10") :: Row("2", "20") :: Row("3", "30") :: Row("4", "40") :: Nil)
+    }
+  }
 }
author	Wenchen Fan <wenchen@databricks.com>	2015-11-08 21:01:53 -0800
committer	Yin Huai <yhuai@databricks.com>	2015-11-08 21:01:53 -0800
commit	d8b50f70298dbf45e91074ee2d751fee7eecb119 (patch)
tree	ad2b1418e3684630bd0ac18349e9c559bbf4782c /sql/hive
parent	97b7080cf2d2846c7257f8926f775f27d457fe7d (diff)
download	spark-d8b50f70298dbf45e91074ee2d751fee7eecb119.tar.gz spark-d8b50f70298dbf45e91074ee2d751fee7eecb119.tar.bz2 spark-d8b50f70298dbf45e91074ee2d751fee7eecb119.zip