aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorWenchen Fan <wenchen@databricks.com>2016-06-13 22:02:23 -0700
committerReynold Xin <rxin@databricks.com>2016-06-13 22:02:23 -0700
commit688b6ef9dc0943d268fab7279ef50bfac1617f04 (patch)
treeafe3fe43b05a6a305ef8afed6296328fec5de1a7
parent1842cdd4ee9f30b0a5f579e26ff5194e81e3634c (diff)
downloadspark-688b6ef9dc0943d268fab7279ef50bfac1617f04.tar.gz
spark-688b6ef9dc0943d268fab7279ef50bfac1617f04.tar.bz2
spark-688b6ef9dc0943d268fab7279ef50bfac1617f04.zip
[SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions
## What changes were proposed in this pull request? In our encoder framework, we imply that serializer expressions should use `BoundReference` to refer to the input object, and a lot of codes depend on this contract(e.g. ExpressionEncoder.tuple). This PR adds some document and assert in `ExpressionEncoder` to make it clearer. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes #13648 from cloud-fan/comment.
-rw-r--r--sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala9
1 files changed, 9 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
index 688082dcce..0023ce64aa 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
@@ -197,6 +197,15 @@ case class ExpressionEncoder[T](
if (flat) require(serializer.size == 1)
+ // serializer expressions are used to encode an object to a row, while the object is usually an
+ // intermediate value produced inside an operator, not from the output of the child operator. This
+ // is quite different from normal expressions, and `AttributeReference` doesn't work here
+ // (intermediate value is not an attribute). We assume that all serializer expressions use a same
+ // `BoundReference` to refer to the object, and throw exception if they don't.
+ assert(serializer.forall(_.references.isEmpty), "serializer cannot reference to any attributes.")
+ assert(serializer.flatMap(_.collect { case b: BoundReference => b}).distinct.length <= 1,
+ "all serializer expressions must use the same BoundReference.")
+
/**
* Returns a new copy of this encoder, where the `deserializer` is resolved and bound to the
* given schema.