[SPARK-5512][Mllib] Run the PIC algorithm with initial vector suggected by the PIC paper

As suggested by the paper of Power Iteration Clustering, it is useful to set the initial vector v0 as the degree vector d. This pr tries to add a running method for that. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #4301 from viirya/pic_degreevector and squashes the following commits: 7db28fb [Liang-Chi Hsieh] Refactor it to address comments. 19cf94e [Liang-Chi Hsieh] Add an option to select initialization method. ec88567 [Liang-Chi Hsieh] Run the PIC algorithm with degree vector d as suggected by the PIC paper.
author: Liang-Chi Hsieh <viirya@gmail.com> 2015-02-02 19:34:25 -0800
committer: Xiangrui Meng <meng@databricks.com> 2015-02-02 19:34:25 -0800
commit: 1bcd46574e442e20f55709d70573f271ce44e5b9 (patch)
tree: d54d597053d9aab0191dca30ad2edfa34b402f45 /mllib/src/test
parent: 0561c4544967fb853419f32e014fac9b8879b0db (diff)
download: spark-1bcd46574e442e20f55709d70573f271ce44e5b9.tar.gz
spark-1bcd46574e442e20f55709d70573f271ce44e5b9.tar.bz2
spark-1bcd46574e442e20f55709d70573f271ce44e5b9.zip
1 files changed, 10 insertions, 0 deletions
diff --git a/mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala
index 2bae465d39..03ecd9ca73 100644
--- a/mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala
@@ -55,6 +55,16 @@ class PowerIterationClusteringSuite extends FunSuite with MLlibTestSparkContext
         predictions(c) += i
     }
     assert(predictions.toSet == Set((0 to 3).toSet, (4 to 15).toSet))
+ 
+    val model2 = new PowerIterationClustering()
+      .setK(2)
+      .setInitializationMode("degree")
+      .run(sc.parallelize(similarities, 2))
+    val predictions2 = Array.fill(2)(mutable.Set.empty[Long])
+    model2.assignments.collect().foreach { case (i, c) =>
+        predictions2(c) += i
+    }
+    assert(predictions2.toSet == Set((0 to 3).toSet, (4 to 15).toSet))
   }
 
   test("normalize and powerIter") {
author	Liang-Chi Hsieh <viirya@gmail.com>	2015-02-02 19:34:25 -0800
committer	Xiangrui Meng <meng@databricks.com>	2015-02-02 19:34:25 -0800
commit	1bcd46574e442e20f55709d70573f271ce44e5b9 (patch)
tree	d54d597053d9aab0191dca30ad2edfa34b402f45 /mllib/src/test
parent	0561c4544967fb853419f32e014fac9b8879b0db (diff)
download	spark-1bcd46574e442e20f55709d70573f271ce44e5b9.tar.gz spark-1bcd46574e442e20f55709d70573f271ce44e5b9.tar.bz2 spark-1bcd46574e442e20f55709d70573f271ce44e5b9.zip