diff options
author | Carson Wang <carson.wang@intel.com> | 2016-02-14 16:00:20 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-02-14 16:00:20 -0800 |
commit | 7cb4d74c98c2f1765b48a549f62e47b53ed29b38 (patch) | |
tree | 919f67c5b5a3053551173e2573ef3661c2160b8e /sql/catalyst | |
parent | 22e9723d6208f2cd2dfa26487ea1c041cb9d7dcd (diff) | |
download | spark-7cb4d74c98c2f1765b48a549f62e47b53ed29b38.tar.gz spark-7cb4d74c98c2f1765b48a549f62e47b53ed29b38.tar.bz2 spark-7cb4d74c98c2f1765b48a549f62e47b53ed29b38.zip |
[SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate method to improve performance
The java `Calendar` object is expensive to create. I have a sub query like this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0))`
The table stores `visitDate` as String type and has 3 billion records. A `Calendar` object is created every time `DateTimeUtils.stringToDate` is called. By reusing the `Calendar` object, I saw about 20 seconds performance improvement for this stage.
Author: Carson Wang <carson.wang@intel.com>
Closes #11090 from carsonwang/SPARK-13185.
Diffstat (limited to 'sql/catalyst')
-rw-r--r-- | sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index a159bc6a61..f184d72285 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -59,6 +59,13 @@ object DateTimeUtils { @transient lazy val defaultTimeZone = TimeZone.getDefault + // Reuse the Calendar object in each thread as it is expensive to create in each method call. + private val threadLocalGmtCalendar = new ThreadLocal[Calendar] { + override protected def initialValue: Calendar = { + Calendar.getInstance(TimeZoneGMT) + } + } + // Java TimeZone has no mention of thread safety. Use thread local instance to be safe. private val threadLocalLocalTimeZone = new ThreadLocal[TimeZone] { override protected def initialValue: TimeZone = { @@ -408,7 +415,8 @@ object DateTimeUtils { segments(2) < 1 || segments(2) > 31) { return None } - val c = Calendar.getInstance(TimeZoneGMT) + val c = threadLocalGmtCalendar.get() + c.clear() c.set(segments(0), segments(1) - 1, segments(2), 0, 0, 0) c.set(Calendar.MILLISECOND, 0) Some((c.getTimeInMillis / MILLIS_PER_DAY).toInt) |