aboutsummaryrefslogtreecommitdiff
path: root/sql/catalyst
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2016-03-16 21:46:04 -0700
committerDavies Liu <davies.liu@gmail.com>2016-03-16 21:46:04 -0700
commitc100d31ddc6db9c03b7a65a20a7dd56dcdc18baf (patch)
tree056ca296a4922ae696ef4b6126329ef77963944e /sql/catalyst
parent917f4000b418ee0997551d4dfabb369c4e9243a9 (diff)
downloadspark-c100d31ddc6db9c03b7a65a20a7dd56dcdc18baf.tar.gz
spark-c100d31ddc6db9c03b7a65a20a7dd56dcdc18baf.tar.bz2
spark-c100d31ddc6db9c03b7a65a20a7dd56dcdc18baf.zip
[SPARK-13873] [SQL] Avoid copy of UnsafeRow when there is no join in whole stage codegen
## What changes were proposed in this pull request? We need to copy the UnsafeRow since a Join could produce multiple rows from single input rows. We could avoid that if there is no join (or the join will not produce multiple rows) inside WholeStageCodegen. Updated the benchmark for `collect`, we could see 20-30% speedup. ## How was this patch tested? existing unit tests. Author: Davies Liu <davies@databricks.com> Closes #11740 from davies/avoid_copy2.
Diffstat (limited to 'sql/catalyst')
-rw-r--r--sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala10
1 files changed, 10 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
index 3dbe634898..dd899d0140 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
@@ -78,6 +78,16 @@ class CodegenContext {
var currentVars: Seq[ExprCode] = null
/**
+ * Whether should we copy the result rows or not.
+ *
+ * If any operator inside WholeStageCodegen generate multiple rows from a single row (for
+ * example, Join), this should be true.
+ *
+ * If an operator starts a new pipeline, this should be reset to false before calling `consume()`.
+ */
+ var copyResult: Boolean = false
+
+ /**
* Holding expressions' mutable states like `MonotonicallyIncreasingID.count` as a
* 3-tuple: java type, variable name, code to init it.
* As an example, ("int", "count", "count = 0;") will produce code: