aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2015-01-13 12:50:31 -0800
committerXiangrui Meng <meng@databricks.com>2015-01-13 12:50:39 -0800
commit1b6596ebee1624dea0acbd23148ac00dfd74d1fb (patch)
tree882db4b71c40f73aee402fdbbdc38bd5f1c18d90 /python
parent78096837c85ca41ce4ffa1aca2663b6d0f14d20d (diff)
downloadspark-1b6596ebee1624dea0acbd23148ac00dfd74d1fb.tar.gz
spark-1b6596ebee1624dea0acbd23148ac00dfd74d1fb.tar.bz2
spark-1b6596ebee1624dea0acbd23148ac00dfd74d1fb.zip
[SPARK-5223] [MLlib] [PySpark] fix MapConverter and ListConverter in MLlib
It will introduce problems if the object in dict/list/tuple can not support by py4j, such as Vector. Also, pickle may have better performance for larger object (less RPC). In some cases that the object in dict/list can not be pickled (such as JavaObject), we should still use MapConvert/ListConvert. This PR should be ported into branch-1.2 Author: Davies Liu <davies@databricks.com> Closes #4023 from davies/listconvert and squashes the following commits: 55d4ab2 [Davies Liu] fix MapConverter and ListConverter in MLlib (cherry picked from commit 8ead999fd627b12837fb2f082a0e76e9d121d269) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Diffstat (limited to 'python')
-rw-r--r--python/pyspark/mllib/common.py6
1 files changed, 2 insertions, 4 deletions
diff --git a/python/pyspark/mllib/common.py b/python/pyspark/mllib/common.py
index 33c49e2399..3c5ee66cd8 100644
--- a/python/pyspark/mllib/common.py
+++ b/python/pyspark/mllib/common.py
@@ -18,7 +18,7 @@
import py4j.protocol
from py4j.protocol import Py4JJavaError
from py4j.java_gateway import JavaObject
-from py4j.java_collections import MapConverter, ListConverter, JavaArray, JavaList
+from py4j.java_collections import ListConverter, JavaArray, JavaList
from pyspark import RDD, SparkContext
from pyspark.serializers import PickleSerializer, AutoBatchedSerializer
@@ -70,9 +70,7 @@ def _py2java(sc, obj):
obj = _to_java_object_rdd(obj)
elif isinstance(obj, SparkContext):
obj = obj._jsc
- elif isinstance(obj, dict):
- obj = MapConverter().convert(obj, sc._gateway._gateway_client)
- elif isinstance(obj, (list, tuple)):
+ elif isinstance(obj, list) and (obj or isinstance(obj[0], JavaObject)):
obj = ListConverter().convert(obj, sc._gateway._gateway_client)
elif isinstance(obj, JavaObject):
pass