diff options
author | FlytxtRnD <meethu.mathew@flytxt.com> | 2015-07-14 23:29:02 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-07-14 23:29:02 -0700 |
commit | 3f6296fed4ee10f53e728eb1e02f13338839b94d (patch) | |
tree | b383880442e93bcdf26718b22a6f34b4263af842 /docs/mllib-clustering.md | |
parent | 4692769655e09d129a62a89a8ffb5d635675aa4d (diff) | |
download | spark-3f6296fed4ee10f53e728eb1e02f13338839b94d.tar.gz spark-3f6296fed4ee10f53e728eb1e02f13338839b94d.tar.bz2 spark-3f6296fed4ee10f53e728eb1e02f13338839b94d.zip |
[SPARK-8018] [MLLIB] KMeans should accept initial cluster centers as param
This allows Kmeans to be initialized using an existing set of cluster centers provided as a KMeansModel object. This mode of initialization performs a single run.
Author: FlytxtRnD <meethu.mathew@flytxt.com>
Closes #6737 from FlytxtRnD/Kmeans-8018 and squashes the following commits:
94b56df [FlytxtRnD] style correction
ef95ee2 [FlytxtRnD] style correction
c446c58 [FlytxtRnD] documentation and numRuns warning change
06d13ef [FlytxtRnD] numRuns corrected
d12336e [FlytxtRnD] numRuns variable modifications
07f8554 [FlytxtRnD] remove setRuns from setIntialModel
e721dfe [FlytxtRnD] Merge remote-tracking branch 'upstream/master' into Kmeans-8018
242ead1 [FlytxtRnD] corrected == to === in assert
714acb5 [FlytxtRnD] added numRuns
60c8ce2 [FlytxtRnD] ignore runs parameter and initialModel test suite changed
582e6d9 [FlytxtRnD] Merge remote-tracking branch 'upstream/master' into Kmeans-8018
3f5fc8e [FlytxtRnD] test case modified and one runs condition added
cd5dc5c [FlytxtRnD] Merge remote-tracking branch 'upstream/master' into Kmeans-8018
16f1b53 [FlytxtRnD] Merge branch 'Kmeans-8018', remote-tracking branch 'upstream/master' into Kmeans-8018
e9c35d7 [FlytxtRnD] Remove getInitialModel and match cluster count criteria
6959861 [FlytxtRnD] Accept initial cluster centers in KMeans
Diffstat (limited to 'docs/mllib-clustering.md')
-rw-r--r-- | docs/mllib-clustering.md | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index d72dc20a5a..0fc7036bff 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -33,6 +33,7 @@ guaranteed to find a globally optimal solution, and when run multiple times on a given dataset, the algorithm returns the best clustering result). * *initializationSteps* determines the number of steps in the k-means\|\| algorithm. * *epsilon* determines the distance threshold within which we consider k-means to have converged. +* *initialModel* is an optional set of cluster centers used for initialization. If this parameter is supplied, only one run is performed. **Examples** |