diff options
author | Wenchen Fan <wenchen@databricks.com> | 2016-06-03 00:43:02 -0700 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-06-03 00:43:02 -0700 |
commit | 190ff274fd71662023a804cf98400c71f9f7da4f (patch) | |
tree | 9b3f79aebf252d3c27f53d9593000c5fd58e1509 /mllib | |
parent | b9fcfb3bd14592ac9f1a8e5c2bb31412b9603b60 (diff) | |
download | spark-190ff274fd71662023a804cf98400c71f9f7da4f.tar.gz spark-190ff274fd71662023a804cf98400c71f9f7da4f.tar.bz2 spark-190ff274fd71662023a804cf98400c71f9f7da4f.zip |
[SPARK-15494][SQL] encoder code cleanup
## What changes were proposed in this pull request?
Our encoder framework has been evolved a lot, this PR tries to clean up the code to make it more readable and emphasise the concept that encoder should be used as a container of serde expressions.
1. move validation logic to analyzer instead of encoder
2. only have a `resolveAndBind` method in encoder instead of `resolve` and `bind`, as we don't have the encoder life cycle concept anymore.
3. `Dataset` don't need to keep a resolved encoder, as there is no such concept anymore. bound encoder is still needed to do serialization outside of query framework.
4. Using `BoundReference` to represent an unresolved field in deserializer expression is kind of weird, this PR adds a `GetColumnByOrdinal` for this purpose. (serializer expression still use `BoundReference`, we can replace it with `GetColumnByOrdinal` in follow-ups)
## How was this patch tested?
existing test
Author: Wenchen Fan <wenchen@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes #13269 from cloud-fan/clean-encoder.
Diffstat (limited to 'mllib')
-rw-r--r-- | mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala b/mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala index be7110ad6b..8b439e6b7a 100644 --- a/mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala +++ b/mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala @@ -29,7 +29,7 @@ object UDTSerializationBenchmark { val iters = 1e2.toInt val numRows = 1e3.toInt - val encoder = ExpressionEncoder[Vector].defaultBinding + val encoder = ExpressionEncoder[Vector].resolveAndBind() val vectors = (1 to numRows).map { i => Vectors.dense(Array.fill(1e5.toInt)(1.0 * i)) |