diff options
author | Michal Senkyr <mike.senkyr@gmail.com> | 2017-01-06 15:05:20 +0800 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2017-01-06 15:05:20 +0800 |
commit | 903bb8e8a2b84b9ea82acbb8ae9d58754862be3a (patch) | |
tree | 1df577fa49e4fd3400920234cc79865f40fbebdc /docs/running-on-mesos.md | |
parent | bcc510b021391035abe6d07c5b82bb0f0be31167 (diff) | |
download | spark-903bb8e8a2b84b9ea82acbb8ae9d58754862be3a.tar.gz spark-903bb8e8a2b84b9ea82acbb8ae9d58754862be3a.tar.bz2 spark-903bb8e8a2b84b9ea82acbb8ae9d58754862be3a.zip |
[SPARK-16792][SQL] Dataset containing a Case Class with a List type causes a CompileException (converting sequence to list)
## What changes were proposed in this pull request?
Added a `to` call at the end of the code generated by `ScalaReflection.deserializerFor` if the requested type is not a supertype of `WrappedArray[_]` that uses `CanBuildFrom[_, _, _]` to convert result into an arbitrary subtype of `Seq[_]`.
Care was taken to preserve the original deserialization where it is possible to avoid the overhead of conversion in cases where it is not needed
`ScalaReflection.serializerFor` could already be used to serialize any `Seq[_]` so it was not altered
`SQLImplicits` had to be altered and new implicit encoders added to permit serialization of other sequence types
Also fixes [SPARK-16815] Dataset[List[T]] leads to ArrayStoreException
## How was this patch tested?
```bash
./build/mvn -DskipTests clean package && ./dev/run-tests
```
Also manual execution of the following sets of commands in the Spark shell:
```scala
case class TestCC(key: Int, letters: List[String])
val ds1 = sc.makeRDD(Seq(
(List("D")),
(List("S","H")),
(List("F","H")),
(List("D","L","L"))
)).map(x=>(x.length,x)).toDF("key","letters").as[TestCC]
val test1=ds1.map{_.key}
test1.show
```
```scala
case class X(l: List[String])
spark.createDataset(Seq(List("A"))).map(X).show
```
```scala
spark.sqlContext.createDataset(sc.parallelize(List(1) :: Nil)).collect
```
After adding arbitrary sequence support also tested with the following commands:
```scala
case class QueueClass(q: scala.collection.immutable.Queue[Int])
spark.createDataset(Seq(List(1,2,3))).map(x => QueueClass(scala.collection.immutable.Queue(x: _*))).map(_.q.dequeue).collect
```
Author: Michal Senkyr <mike.senkyr@gmail.com>
Closes #16240 from michalsenkyr/sql-caseclass-list-fix.
Diffstat (limited to 'docs/running-on-mesos.md')
0 files changed, 0 insertions, 0 deletions