diff options
author | Xusen Yin <yinxusen@gmail.com> | 2016-01-25 22:41:52 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2016-01-25 22:41:52 -0800 |
commit | ae47ba718a280fc12720a71b981c38dbe647f35b (patch) | |
tree | dce29f474ab43e90cb7a46e509bab4c77958fee7 /mllib/src | |
parent | b66afdeb5253913d916dcf159aaed4ffdc15fd4b (diff) | |
download | spark-ae47ba718a280fc12720a71b981c38dbe647f35b.tar.gz spark-ae47ba718a280fc12720a71b981c38dbe647f35b.tar.bz2 spark-ae47ba718a280fc12720a71b981c38dbe647f35b.zip |
[SPARK-12834] Change ser/de of JavaArray and JavaList
https://issues.apache.org/jira/browse/SPARK-12834
We use `SerDe.dumps()` to serialize `JavaArray` and `JavaList` in `PythonMLLibAPI`, then deserialize them with `PickleSerializer` in Python side. However, there is no need to transform them in such an inefficient way. Instead of it, we can use type conversion to convert them, e.g. `list(JavaArray)` or `list(JavaList)`. What's more, there is an issue to Ser/De Scala Array as I said in https://issues.apache.org/jira/browse/SPARK-12780
Author: Xusen Yin <yinxusen@gmail.com>
Closes #10772 from yinxusen/SPARK-12834.
Diffstat (limited to 'mllib/src')
-rw-r--r-- | mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala | 6 |
1 files changed, 5 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala index 05f9a76d32..088ec6a0c0 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala @@ -1490,7 +1490,11 @@ private[spark] object SerDe extends Serializable { initialize() def dumps(obj: AnyRef): Array[Byte] = { - new Pickler().dumps(obj) + obj match { + // Pickler in Python side cannot deserialize Scala Array normally. See SPARK-12834. + case array: Array[_] => new Pickler().dumps(array.toSeq.asJava) + case _ => new Pickler().dumps(obj) + } } def loads(bytes: Array[Byte]): AnyRef = { |