[SPARK-18752][HIVE] isSrcLocal" value should be set from user query.

The value of the "isSrcLocal" parameter passed to Hive's loadTable and loadPartition methods needs to be set according to the user query (e.g. "LOAD DATA LOCAL"), and not the current code that tries to guess what it should be. For existing versions of Hive the current behavior is probably ok, but some recent changes in the Hive code changed the semantics slightly, making code that sets "isSrcLocal" to "true" incorrectly to do the wrong thing. It would end up moving the parent directory of the files into the final location, instead of the file themselves, resulting in a table that cannot be read. I modified HiveCommandSuite so that existing "LOAD DATA" tests are run both in local and non-local mode, since the semantics are slightly different. The tests include a few new checks to make sure the semantics follow what Hive describes in its documentation. Tested with existing unit tests and also ran some Hive integration tests with a version of Hive containing the changes that surfaced the problem. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #16179 from vanzin/SPARK-18752.
author: Marcelo Vanzin <vanzin@cloudera.com> 2016-12-12 14:19:42 -0800
committer: gatorsmile <gatorsmile@gmail.com> 2016-12-12 14:19:42 -0800
commit: 476b34c23a1ece1d52654482a393003756957ad2 (patch)
tree: fe86b2301f21d92ccb96c08c3182749d2d0ef3cb /sql/core
parent: bf42c2db57b9a2ca642ad3d499c30be8d9ff221a (diff)
download: spark-476b34c23a1ece1d52654482a393003756957ad2.tar.gz
spark-476b34c23a1ece1d52654482a393003756957ad2.tar.bz2
spark-476b34c23a1ece1d52654482a393003756957ad2.zip
1 files changed, 5 insertions, 3 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index 32e2f75737..d2a7556476 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -203,7 +203,7 @@ case class LoadDataCommand(
         throw new AnalysisException(s"LOAD DATA target table $tableIdentwithDB is partitioned, " +
           s"but number of columns in provided partition spec (${partition.get.size}) " +
           s"do not match number of partitioned columns in table " +
-          s"(s${targetTable.partitionColumnNames.size})")
+          s"(${targetTable.partitionColumnNames.size})")
       }
       partition.get.keys.foreach { colName =>
         if (!targetTable.partitionColumnNames.contains(colName)) {
@@ -297,13 +297,15 @@ case class LoadDataCommand(
         partition.get,
         isOverwrite,
         holdDDLTime = false,
-        inheritTableSpecs = true)
+        inheritTableSpecs = true,
+        isSrcLocal = isLocal)
     } else {
       catalog.loadTable(
         targetTable.identifier,
         loadPath.toString,
         isOverwrite,
-        holdDDLTime = false)
+        holdDDLTime = false,
+        isSrcLocal = isLocal)
     }
     Seq.empty[Row]
   }
author	Marcelo Vanzin <vanzin@cloudera.com>	2016-12-12 14:19:42 -0800
committer	gatorsmile <gatorsmile@gmail.com>	2016-12-12 14:19:42 -0800
commit	476b34c23a1ece1d52654482a393003756957ad2 (patch)
tree	fe86b2301f21d92ccb96c08c3182749d2d0ef3cb /sql/core
parent	bf42c2db57b9a2ca642ad3d499c30be8d9ff221a (diff)
download	spark-476b34c23a1ece1d52654482a393003756957ad2.tar.gz spark-476b34c23a1ece1d52654482a393003756957ad2.tar.bz2 spark-476b34c23a1ece1d52654482a393003756957ad2.zip