aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/img/PIClusteringFiveCirclesInputsAndOutputs.pngbin0 -> 249245 bytes
-rw-r--r--docs/mllib-clustering.md20
2 files changed, 20 insertions, 0 deletions
diff --git a/docs/img/PIClusteringFiveCirclesInputsAndOutputs.png b/docs/img/PIClusteringFiveCirclesInputsAndOutputs.png
new file mode 100644
index 0000000000..ed9adad11d
--- /dev/null
+++ b/docs/img/PIClusteringFiveCirclesInputsAndOutputs.png
Binary files differ
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index c696ae9c8e..413b824e36 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -34,6 +34,26 @@ a given dataset, the algorithm returns the best clustering result).
* *initializationSteps* determines the number of steps in the k-means\|\| algorithm.
* *epsilon* determines the distance threshold within which we consider k-means to have converged.
+### Power Iteration Clustering
+
+Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm:
+
+* accepts a [Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points.
+* calculates the principal eigenvalue and eigenvector
+* Clusters each of the input points according to their principal eigenvector component value
+
+Details of this algorithm are found within [Power Iteration Clustering, Lin and Cohen]{www.icml2010.org/papers/387.pdf}
+
+Example outputs for a dataset inspired by the paper - but with five clusters instead of three- have he following output from our implementation:
+
+<p style="text-align: center;">
+ <img src="img/PIClusteringFiveCirclesInputsAndOutputs.png"
+ title="The Property Graph"
+ alt="The Property Graph"
+ width="50%" />
+ <!-- Images are downsized intentionally to improve quality on retina displays -->
+</p>
+
### Examples
<div class="codetabs">