aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-clustering.md
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2016-01-13 18:01:29 -0800
committerJoseph K. Bradley <joseph@databricks.com>2016-01-13 18:01:29 -0800
commit20d8ef858af6e13db59df118b562ea33cba5464d (patch)
tree5b9da631e049374ff670322d4820d561786c8aee /docs/mllib-clustering.md
parent021dafc6a05a31dc22c9f9110dedb47a1f913087 (diff)
downloadspark-20d8ef858af6e13db59df118b562ea33cba5464d.tar.gz
spark-20d8ef858af6e13db59df118b562ea33cba5464d.tar.bz2
spark-20d8ef858af6e13db59df118b562ea33cba5464d.zip
[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example
Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python. Author: Joseph K. Bradley <joseph@databricks.com> Closes #10707 from jkbradley/kmeans-doc-fix.
Diffstat (limited to 'docs/mllib-clustering.md')
-rw-r--r--docs/mllib-clustering.md6
1 files changed, 1 insertions, 5 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 93cd0c1c61..d0be032868 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -152,11 +152,7 @@ clusters = KMeans.train(parsedData, 2, maxIterations=10,
runs=10, initializationMode="random")
# Evaluate clustering by computing Within Set Sum of Squared Errors
-def error(point):
- center = clusters.centers[clusters.predict(point)]
- return sqrt(sum([x**2 for x in (point - center)]))
-
-WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
+WSSSE = clusters.computeCost(parsedData)
print("Within Set Sum of Squared Error = " + str(WSSSE))
# Save and load model