aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/mllib/clustering.py
diff options
context:
space:
mode:
authorYanbo Liang <ybliang8@gmail.com>2016-09-11 13:47:13 +0100
committerSean Owen <sowen@cloudera.com>2016-09-11 13:47:13 +0100
commit883c7631847a95684534222c1b6cfed8e62710c8 (patch)
treeb7b007aa8e0891b507a87539b4221e711ce0c9fd /python/pyspark/mllib/clustering.py
parentc76baff0cc4775c2191d075cc9a8176e4915fec8 (diff)
downloadspark-883c7631847a95684534222c1b6cfed8e62710c8.tar.gz
spark-883c7631847a95684534222c1b6cfed8e62710c8.tar.bz2
spark-883c7631847a95684534222c1b6cfed8e62710c8.zip
[SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means|| default init steps from 5 to 2.
## What changes were proposed in this pull request? #14956 reduced default k-means|| init steps to 2 from 5 only for spark.mllib package, we should also do same change for spark.ml and PySpark. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15050 from yanboliang/spark-17389.
Diffstat (limited to 'python/pyspark/mllib/clustering.py')
-rw-r--r--python/pyspark/mllib/clustering.py6
1 files changed, 3 insertions, 3 deletions
diff --git a/python/pyspark/mllib/clustering.py b/python/pyspark/mllib/clustering.py
index 29aa615125..2036168e45 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -306,7 +306,7 @@ class KMeans(object):
@classmethod
@since('0.9.0')
def train(cls, rdd, k, maxIterations=100, runs=1, initializationMode="k-means||",
- seed=None, initializationSteps=5, epsilon=1e-4, initialModel=None):
+ seed=None, initializationSteps=2, epsilon=1e-4, initialModel=None):
"""
Train a k-means clustering model.
@@ -330,9 +330,9 @@ class KMeans(object):
(default: None)
:param initializationSteps:
Number of steps for the k-means|| initialization mode.
- This is an advanced setting -- the default of 5 is almost
+ This is an advanced setting -- the default of 2 is almost
always enough.
- (default: 5)
+ (default: 2)
:param epsilon:
Distance threshold within which a center will be considered to
have converged. If all centers move less than this Euclidean