[SPARK-6259] [MLLIB] Python API for LDA

I implemented the Python API for LDA. But I didn't implemented a method for `LDAModel.describeTopics()`, beause it's a little hard to implement it now. And adding document about that and an example code would fit for another issue. TODO: LDAModel.describeTopics() in Python must be also implemented. But it would be nice to fit for another issue. Implementing it is a little hard, since the return value of `describeTopics` in Scala consists of Tuple classes. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #6791 from yu-iskw/SPARK-6259 and squashes the following commits: 6855f59 [Yu ISHIKAWA] LDA inherits object 28bd165 [Yu ISHIKAWA] Change the place of testing code d7a332a [Yu ISHIKAWA] Remove the doc comment about the optimizer's default value 083e226 [Yu ISHIKAWA] Add the comment about the supported values and the default value of `optimizer` 9f8bed8 [Yu ISHIKAWA] Simplify casting faa9764 [Yu ISHIKAWA] Add some comments for the LDA paramters 98f645a [Yu ISHIKAWA] Remove the interface for `describeTopics`. Because it is not implemented. 57ac03d [Yu ISHIKAWA] Remove the unnecessary import in Python unit testing 73412c3 [Yu ISHIKAWA] Fix the typo 2278829 [Yu ISHIKAWA] Fix the indentation 39514ec [Yu ISHIKAWA] Modify how to cast the input data 8117e18 [Yu ISHIKAWA] Fix the validation problems by `lint-scala` 77fd1b7 [Yu ISHIKAWA] Not use LabeledPoint 68f0653 [Yu ISHIKAWA] Support some parameters for `ALS.train()` in Python 25ef2ac [Yu ISHIKAWA] Resolve conflicts with rebasing
author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> 2015-07-14 23:27:42 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2015-07-14 23:27:42 -0700
commit: 4692769655e09d129a62a89a8ffb5d635675aa4d (patch)
tree: b89ab2920c77ba44ad9897cbe6b524195b899820 /mllib
parent: c6b1a9e74e34267dc198e57a184c41498ca9d6a3 (diff)
download: spark-4692769655e09d129a62a89a8ffb5d635675aa4d.tar.gz
spark-4692769655e09d129a62a89a8ffb5d635675aa4d.tar.bz2
spark-4692769655e09d129a62a89a8ffb5d635675aa4d.zip
1 files changed, 33 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
index e628059c4a..c58a64001d 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
@@ -503,6 +503,39 @@ private[python] class PythonMLLibAPI extends Serializable {
   }
 
   /**
+   * Java stub for Python mllib LDA.run()
+   */
+  def trainLDAModel(
+      data: JavaRDD[java.util.List[Any]],
+      k: Int,
+      maxIterations: Int,
+      docConcentration: Double,
+      topicConcentration: Double,
+      seed: java.lang.Long,
+      checkpointInterval: Int,
+      optimizer: String): LDAModel = {
+    val algo = new LDA()
+      .setK(k)
+      .setMaxIterations(maxIterations)
+      .setDocConcentration(docConcentration)
+      .setTopicConcentration(topicConcentration)
+      .setCheckpointInterval(checkpointInterval)
+      .setOptimizer(optimizer)
+
+    if (seed != null) algo.setSeed(seed)
+
+    val documents = data.rdd.map(_.asScala.toArray).map { r =>
+      r(0) match {
+        case i: java.lang.Integer => (i.toLong, r(1).asInstanceOf[Vector])
+        case i: java.lang.Long => (i.toLong, r(1).asInstanceOf[Vector])
+        case _ => throw new IllegalArgumentException("input values contains invalid type value.")
+      }
+    }
+    algo.run(documents)
+  }
+
+
+  /**
    * Java stub for Python mllib FPGrowth.train().  This stub returns a handle
    * to the Java object instead of the content of the Java object.  Extra care
    * needs to be taken in the Python code to ensure it gets freed on exit; see
author	Yu ISHIKAWA <yuu.ishikawa@gmail.com>	2015-07-14 23:27:42 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-07-14 23:27:42 -0700
commit	4692769655e09d129a62a89a8ffb5d635675aa4d (patch)
tree	b89ab2920c77ba44ad9897cbe6b524195b899820 /mllib
parent	c6b1a9e74e34267dc198e57a184c41498ca9d6a3 (diff)
download	spark-4692769655e09d129a62a89a8ffb5d635675aa4d.tar.gz spark-4692769655e09d129a62a89a8ffb5d635675aa4d.tar.bz2 spark-4692769655e09d129a62a89a8ffb5d635675aa4d.zip