From 688b6ef9dc0943d268fab7279ef50bfac1617f04 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Mon, 13 Jun 2016 22:02:23 -0700 Subject: [SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions ## What changes were proposed in this pull request? In our encoder framework, we imply that serializer expressions should use `BoundReference` to refer to the input object, and a lot of codes depend on this contract(e.g. ExpressionEncoder.tuple). This PR adds some document and assert in `ExpressionEncoder` to make it clearer. ## How was this patch tested? existing tests Author: Wenchen Fan Closes #13648 from cloud-fan/comment. --- .../apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala index 688082dcce..0023ce64aa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala @@ -197,6 +197,15 @@ case class ExpressionEncoder[T]( if (flat) require(serializer.size == 1) + // serializer expressions are used to encode an object to a row, while the object is usually an + // intermediate value produced inside an operator, not from the output of the child operator. This + // is quite different from normal expressions, and `AttributeReference` doesn't work here + // (intermediate value is not an attribute). We assume that all serializer expressions use a same + // `BoundReference` to refer to the object, and throw exception if they don't. + assert(serializer.forall(_.references.isEmpty), "serializer cannot reference to any attributes.") + assert(serializer.flatMap(_.collect { case b: BoundReference => b}).distinct.length <= 1, + "all serializer expressions must use the same BoundReference.") + /** * Returns a new copy of this encoder, where the `deserializer` is resolved and bound to the * given schema. -- cgit v1.2.3