[SPARK-6263] [MLLIB] Python MLlib API missing items: Utils

Implement missing API in pyspark. MLUtils * appendBias * loadVectors `kFold` is also missing however I am not sure `ClassTag` can be passed or restored through python. Author: lewuathe <lewuathe@me.com> Closes #5707 from Lewuathe/SPARK-6263 and squashes the following commits: 16863ea [lewuathe] Merge master 3fc27e7 [lewuathe] Merge branch 'master' into SPARK-6263 6084e9c [lewuathe] Resolv conflict d2aa2a0 [lewuathe] Resolv conflict 9c329d8 [lewuathe] Fix efficiency 3a12a2d [lewuathe] Merge branch 'master' into SPARK-6263 1d4714b [lewuathe] Fix style b29e2bc [lewuathe] Remove scipy dependencies e32eb40 [lewuathe] Merge branch 'master' into SPARK-6263 25d3c9d [lewuathe] Remove unnecessary imports 7ec04db [lewuathe] Resolv conflict 1502d13 [lewuathe] Resolv conflict d6bd416 [lewuathe] Check existence of scipy.sparse 5d555b1 [lewuathe] Construct scipy.sparse matrix c345a44 [lewuathe] Merge branch 'master' into SPARK-6263 b8b5ef7 [lewuathe] Fix unnecessary sort method d254be7 [lewuathe] Merge branch 'master' into SPARK-6263 62a9c7e [lewuathe] Fix appendBias return type 454c73d [lewuathe] Merge branch 'master' into SPARK-6263 a353354 [lewuathe] Remove unnecessary appendBias implementation 44295c2 [lewuathe] Merge branch 'master' into SPARK-6263 64f72ad [lewuathe] Merge branch 'master' into SPARK-6263 c728046 [lewuathe] Fix style 2980569 [lewuathe] [SPARK-6263] Python MLlib API missing items: Utils
author: lewuathe <lewuathe@me.com> 2015-07-01 11:14:07 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2015-07-01 11:14:07 -0700
commit: 184de91d15a4bfc5c014e8cf86211874bba4593f (patch)
tree: 3d9f484d0a40bc0ea928ffa23383a64b49c65f80 /mllib
parent: 31b4a3d7f2be9053a041e5ae67418562a93d80d8 (diff)
download: spark-184de91d15a4bfc5c014e8cf86211874bba4593f.tar.gz
spark-184de91d15a4bfc5c014e8cf86211874bba4593f.tar.bz2
spark-184de91d15a4bfc5c014e8cf86211874bba4593f.zip
1 files changed, 9 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
index a66a404d5c..458fab48fe 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
@@ -75,6 +75,15 @@ private[python] class PythonMLLibAPI extends Serializable {
       minPartitions: Int): JavaRDD[LabeledPoint] =
     MLUtils.loadLabeledPoints(jsc.sc, path, minPartitions)
 
+  /**
+   * Loads and serializes vectors saved with `RDD#saveAsTextFile`.
+   * @param jsc Java SparkContext
+   * @param path file or directory path in any Hadoop-supported file system URI
+   * @return serialized vectors in a RDD
+   */
+  def loadVectors(jsc: JavaSparkContext, path: String): RDD[Vector] =
+    MLUtils.loadVectors(jsc.sc, path)
+
   private def trainRegressionModel(
       learner: GeneralizedLinearAlgorithm[_ <: GeneralizedLinearModel],
       data: JavaRDD[LabeledPoint],
author	lewuathe <lewuathe@me.com>	2015-07-01 11:14:07 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-07-01 11:14:07 -0700
commit	184de91d15a4bfc5c014e8cf86211874bba4593f (patch)
tree	3d9f484d0a40bc0ea928ffa23383a64b49c65f80 /mllib
parent	31b4a3d7f2be9053a041e5ae67418562a93d80d8 (diff)
download	spark-184de91d15a4bfc5c014e8cf86211874bba4593f.tar.gz spark-184de91d15a4bfc5c014e8cf86211874bba4593f.tar.bz2 spark-184de91d15a4bfc5c014e8cf86211874bba4593f.zip