diff options
author | Cheng Hao <hao.cheng@intel.com> | 2014-05-07 03:37:12 -0400 |
---|---|---|
committer | Reynold Xin <rxin@apache.org> | 2014-05-07 03:37:12 -0400 |
commit | 3eb53bd59e828275471d41730e6de601a887416d (patch) | |
tree | f728e59cb7eecf5e61e5bfb9d5e4672c6b6f147a /sql/hive/src/main | |
parent | 913a0a9c0a87e164723ebf9616b883b6329bac71 (diff) | |
download | spark-3eb53bd59e828275471d41730e6de601a887416d.tar.gz spark-3eb53bd59e828275471d41730e6de601a887416d.tar.bz2 spark-3eb53bd59e828275471d41730e6de601a887416d.zip |
[WIP][Spark-SQL] Optimize the Constant Folding for Expression
Currently, expression does not support the "constant null" well in constant folding.
e.g. Sum(a, 0) actually always produces Literal(0, NumericType) in runtime.
For example:
```
explain select isnull(key+null) from src;
== Logical Plan ==
Project [HiveGenericUdf#isnull((key#30 + CAST(null, IntegerType))) AS c_0#28]
MetastoreRelation default, src, None
== Optimized Logical Plan ==
Project [true AS c_0#28]
MetastoreRelation default, src, None
== Physical Plan ==
Project [true AS c_0#28]
HiveTableScan [], (MetastoreRelation default, src, None), None
```
I've create a new Optimization rule called NullPropagation for such kind of constant folding.
Author: Cheng Hao <hao.cheng@intel.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #482 from chenghao-intel/optimize_constant_folding and squashes the following commits:
2f14b50 [Cheng Hao] Fix code style issues
68b9fad [Cheng Hao] Remove the Literal pattern matching for NullPropagation
29c8166 [Cheng Hao] Update the code for feedback of code review
50444cc [Cheng Hao] Remove the unnecessary null checking
80f9f18 [Cheng Hao] Update the UnitTest for aggregation constant folding
27ea3d7 [Cheng Hao] Fix Constant Folding Bugs & Add More Unittests
b28e03a [Cheng Hao] Merge pull request #1 from marmbrus/pr/482
9ccefdb [Michael Armbrust] Add tests for optimized expression evaluation.
543ef9d [Cheng Hao] fix code style issues
9cf0396 [Cheng Hao] update code according to the code review comment
536c005 [Cheng Hao] Add Exceptional case for constant folding
3c045c7 [Cheng Hao] Optimize the Constant Folding by adding more rules
2645d4f [Cheng Hao] Constant Folding(null propagation)
Diffstat (limited to 'sql/hive/src/main')
-rw-r--r-- | sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala index c7de4ab6d3..d50e2c65b7 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala @@ -22,6 +22,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.hadoop.hive.common.`type`.HiveDecimal import org.apache.hadoop.hive.ql.exec.UDF import org.apache.hadoop.hive.ql.exec.{FunctionInfo, FunctionRegistry} +import org.apache.hadoop.hive.ql.udf.{UDFType => HiveUDFType} import org.apache.hadoop.hive.ql.udf.generic._ import org.apache.hadoop.hive.serde2.objectinspector._ import org.apache.hadoop.hive.serde2.objectinspector.primitive._ @@ -237,6 +238,16 @@ private[hive] case class HiveGenericUdf(name: String, children: Seq[Expression]) @transient protected lazy val returnInspector = function.initialize(argumentInspectors.toArray) + @transient + protected lazy val isUDFDeterministic = { + val udfType = function.getClass().getAnnotation(classOf[HiveUDFType]) + (udfType != null && udfType.deterministic()) + } + + override def foldable = { + isUDFDeterministic && children.foldLeft(true)((prev, n) => prev && n.foldable) + } + val dataType: DataType = inspectorToDataType(returnInspector) override def eval(input: Row): Any = { |