[SPARK-18936][SQL] Infrastructure for session local timezone support.

## What changes were proposed in this pull request? As of Spark 2.1, Spark SQL assumes the machine timezone for datetime manipulation, which is bad if users are not in the same timezones as the machines, or if different users have different timezones. We should introduce a session local timezone setting that is used for execution. An explicit non-goal is locale handling. ### Semantics Setting the session local timezone means that the timezone-aware expressions listed below should use the timezone to evaluate values, and also it should be used to convert (cast) between string and timestamp or between timestamp and date. - `CurrentDate` - `CurrentBatchTimestamp` - `Hour` - `Minute` - `Second` - `DateFormatClass` - `ToUnixTimestamp` - `UnixTimestamp` - `FromUnixTime` and below are implicitly timezone-aware through cast from timestamp to date: - `DayOfYear` - `Year` - `Quarter` - `Month` - `DayOfMonth` - `WeekOfYear` - `LastDay` - `NextDay` - `TruncDate` For example, if you have timestamp `"2016-01-01 00:00:00"` in `GMT`, the values evaluated by some of timezone-aware expressions are: ```scala scala> val df = Seq(new java.sql.Timestamp(1451606400000L)).toDF("ts") df: org.apache.spark.sql.DataFrame = [ts: timestamp] scala> df.selectExpr("cast(ts as string)", "year(ts)", "month(ts)", "dayofmonth(ts)", "hour(ts)", "minute(ts)", "second(ts)").show(truncate = false) +-------------------+----------------------+-----------------------+----------------------------+--------+----------+----------+ |ts |year(CAST(ts AS DATE))|month(CAST(ts AS DATE))|dayofmonth(CAST(ts AS DATE))|hour(ts)|minute(ts)|second(ts)| +-------------------+----------------------+-----------------------+----------------------------+--------+----------+----------+ |2016-01-01 00:00:00|2016 |1 |1 |0 |0 |0 | +-------------------+----------------------+-----------------------+----------------------------+--------+----------+----------+ ``` whereas setting the session local timezone to `"PST"`, they are: ```scala scala> spark.conf.set("spark.sql.session.timeZone", "PST") scala> df.selectExpr("cast(ts as string)", "year(ts)", "month(ts)", "dayofmonth(ts)", "hour(ts)", "minute(ts)", "second(ts)").show(truncate = false) +-------------------+----------------------+-----------------------+----------------------------+--------+----------+----------+ |ts |year(CAST(ts AS DATE))|month(CAST(ts AS DATE))|dayofmonth(CAST(ts AS DATE))|hour(ts)|minute(ts)|second(ts)| +-------------------+----------------------+-----------------------+----------------------------+--------+----------+----------+ |2015-12-31 16:00:00|2015 |12 |31 |16 |0 |0 | +-------------------+----------------------+-----------------------+----------------------------+--------+----------+----------+ ``` Notice that even if you set the session local timezone, it affects only in `DataFrame` operations, neither in `Dataset` operations, `RDD` operations nor in `ScalaUDF`s. You need to properly handle timezone by yourself. ### Design of the fix I introduced an analyzer to pass session local timezone to timezone-aware expressions and modified DateTimeUtils to take the timezone argument. ## How was this patch tested? Existing tests and added tests for timezone aware expressions. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #16308 from ueshin/issues/SPARK-18350.
author: Takuya UESHIN <ueshin@happy-camper.st> 2017-01-26 11:51:05 +0100
committer: Herman van Hovell <hvanhovell@databricks.com> 2017-01-26 11:51:05 +0100
commit: 2969fb4370120a39dae98be716b24dcc0ada2cef (patch)
tree: abb577f50071ed97d2934db8a896e58e30907309 /sql/hive/compatibility
parent: 7045b8b3554459fe61c4b32868560e66444a2876 (diff)
download: spark-2969fb4370120a39dae98be716b24dcc0ada2cef.tar.gz
spark-2969fb4370120a39dae98be716b24dcc0ada2cef.tar.bz2
spark-2969fb4370120a39dae98be716b24dcc0ada2cef.zip
1 files changed, 5 insertions, 0 deletions
diff --git a/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala b/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
index d217e9b4fe..f78660f7c1 100644
--- a/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
+++ b/sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
@@ -41,6 +41,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter {
   private val originalInMemoryPartitionPruning = TestHive.conf.inMemoryPartitionPruning
   private val originalConvertMetastoreOrc = TestHive.sessionState.convertMetastoreOrc
   private val originalCrossJoinEnabled = TestHive.conf.crossJoinEnabled
+  private val originalSessionLocalTimeZone = TestHive.conf.sessionLocalTimeZone
 
   def testCases: Seq[(String, File)] = {
     hiveQueryDir.listFiles.map(f => f.getName.stripSuffix(".q") -> f)
@@ -62,6 +63,9 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter {
     TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, false)
     // Ensures that cross joins are enabled so that we can test them
     TestHive.setConf(SQLConf.CROSS_JOINS_ENABLED, true)
+    // Fix session local timezone to America/Los_Angeles for those timezone sensitive tests
+    // (timestamp_*)
+    TestHive.setConf(SQLConf.SESSION_LOCAL_TIMEZONE, "America/Los_Angeles")
     RuleExecutor.resetTime()
   }
 
@@ -74,6 +78,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter {
       TestHive.setConf(SQLConf.IN_MEMORY_PARTITION_PRUNING, originalInMemoryPartitionPruning)
       TestHive.setConf(HiveUtils.CONVERT_METASTORE_ORC, originalConvertMetastoreOrc)
       TestHive.setConf(SQLConf.CROSS_JOINS_ENABLED, originalCrossJoinEnabled)
+      TestHive.setConf(SQLConf.SESSION_LOCAL_TIMEZONE, originalSessionLocalTimeZone)
 
       // For debugging dump some statistics about how much time was spent in various optimizer rules
       logWarning(RuleExecutor.dumpTimeSpent())
author	Takuya UESHIN <ueshin@happy-camper.st>	2017-01-26 11:51:05 +0100
committer	Herman van Hovell <hvanhovell@databricks.com>	2017-01-26 11:51:05 +0100
commit	2969fb4370120a39dae98be716b24dcc0ada2cef (patch)
tree	abb577f50071ed97d2934db8a896e58e30907309 /sql/hive/compatibility
parent	7045b8b3554459fe61c4b32868560e66444a2876 (diff)
download	spark-2969fb4370120a39dae98be716b24dcc0ada2cef.tar.gz spark-2969fb4370120a39dae98be716b24dcc0ada2cef.tar.bz2 spark-2969fb4370120a39dae98be716b24dcc0ada2cef.zip