diff options
author | Davies Liu <davies@databricks.com> | 2014-10-30 22:25:18 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-10-30 22:25:18 -0700 |
commit | 872fc669b497fb255db3212568f2a14c2ba0d5db (patch) | |
tree | 6dcaa7e0b251fa5f233171e2878a4dc428db2348 /python/pyspark/mllib/util.py | |
parent | 0734d09320fe37edd3a02718511cda0bda852478 (diff) | |
download | spark-872fc669b497fb255db3212568f2a14c2ba0d5db.tar.gz spark-872fc669b497fb255db3212568f2a14c2ba0d5db.tar.bz2 spark-872fc669b497fb255db3212568f2a14c2ba0d5db.zip |
[SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API
Create several helper functions to call MLlib Java API, convert the arguments to Java type and convert return value to Python object automatically, this simplify serialization in MLlib Python API very much.
After this, the MLlib Python API does not need to deal with serialization details anymore, it's easier to add new API.
cc mengxr
Author: Davies Liu <davies@databricks.com>
Closes #2995 from davies/cleanup and squashes the following commits:
8fa6ec6 [Davies Liu] address comments
16b85a0 [Davies Liu] Merge branch 'master' of github.com:apache/spark into cleanup
43743e5 [Davies Liu] bugfix
731331f [Davies Liu] simplify serialization in MLlib Python API
Diffstat (limited to 'python/pyspark/mllib/util.py')
-rw-r--r-- | python/pyspark/mllib/util.py | 7 |
1 files changed, 2 insertions, 5 deletions
diff --git a/python/pyspark/mllib/util.py b/python/pyspark/mllib/util.py index 84b39a4861..96aef8f510 100644 --- a/python/pyspark/mllib/util.py +++ b/python/pyspark/mllib/util.py @@ -18,8 +18,7 @@ import numpy as np import warnings -from pyspark.rdd import RDD -from pyspark.serializers import AutoBatchedSerializer, PickleSerializer +from pyspark.mllib.common import callMLlibFunc from pyspark.mllib.linalg import Vectors, SparseVector, _convert_to_vector from pyspark.mllib.regression import LabeledPoint @@ -173,9 +172,7 @@ class MLUtils(object): (0.0,[1.01,2.02,3.03]) """ minPartitions = minPartitions or min(sc.defaultParallelism, 2) - jrdd = sc._jvm.PythonMLLibAPI().loadLabeledPoints(sc._jsc, path, minPartitions) - jpyrdd = sc._jvm.SerDe.javaToPython(jrdd) - return RDD(jpyrdd, sc, AutoBatchedSerializer(PickleSerializer())) + return callMLlibFunc("loadLabeledPoints", sc, path, minPartitions) def _test(): |