diff options
author | Wenchen Fan <wenchen@databricks.com> | 2016-06-13 22:02:23 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-06-13 22:02:23 -0700 |
commit | 688b6ef9dc0943d268fab7279ef50bfac1617f04 (patch) | |
tree | afe3fe43b05a6a305ef8afed6296328fec5de1a7 /sql/catalyst | |
parent | 1842cdd4ee9f30b0a5f579e26ff5194e81e3634c (diff) | |
download | spark-688b6ef9dc0943d268fab7279ef50bfac1617f04.tar.gz spark-688b6ef9dc0943d268fab7279ef50bfac1617f04.tar.bz2 spark-688b6ef9dc0943d268fab7279ef50bfac1617f04.zip |
[SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions
## What changes were proposed in this pull request?
In our encoder framework, we imply that serializer expressions should use `BoundReference` to refer to the input object, and a lot of codes depend on this contract(e.g. ExpressionEncoder.tuple). This PR adds some document and assert in `ExpressionEncoder` to make it clearer.
## How was this patch tested?
existing tests
Author: Wenchen Fan <wenchen@databricks.com>
Closes #13648 from cloud-fan/comment.
Diffstat (limited to 'sql/catalyst')
-rw-r--r-- | sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala index 688082dcce..0023ce64aa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala @@ -197,6 +197,15 @@ case class ExpressionEncoder[T]( if (flat) require(serializer.size == 1) + // serializer expressions are used to encode an object to a row, while the object is usually an + // intermediate value produced inside an operator, not from the output of the child operator. This + // is quite different from normal expressions, and `AttributeReference` doesn't work here + // (intermediate value is not an attribute). We assume that all serializer expressions use a same + // `BoundReference` to refer to the object, and throw exception if they don't. + assert(serializer.forall(_.references.isEmpty), "serializer cannot reference to any attributes.") + assert(serializer.flatMap(_.collect { case b: BoundReference => b}).distinct.length <= 1, + "all serializer expressions must use the same BoundReference.") + /** * Returns a new copy of this encoder, where the `deserializer` is resolved and bound to the * given schema. |