diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2016-01-13 18:01:29 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2016-01-13 18:01:29 -0800 |
commit | 20d8ef858af6e13db59df118b562ea33cba5464d (patch) | |
tree | 5b9da631e049374ff670322d4820d561786c8aee /docs/mllib-clustering.md | |
parent | 021dafc6a05a31dc22c9f9110dedb47a1f913087 (diff) | |
download | spark-20d8ef858af6e13db59df118b562ea33cba5464d.tar.gz spark-20d8ef858af6e13db59df118b562ea33cba5464d.tar.bz2 spark-20d8ef858af6e13db59df118b562ea33cba5464d.zip |
[SPARK-12703][MLLIB][DOC][PYTHON] Fixed pyspark.mllib.clustering.KMeans user guide example
Fixed WSSSE computeCost in Python mllib KMeans user guide example by using new computeCost method API in Python.
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #10707 from jkbradley/kmeans-doc-fix.
Diffstat (limited to 'docs/mllib-clustering.md')
-rw-r--r-- | docs/mllib-clustering.md | 6 |
1 files changed, 1 insertions, 5 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 93cd0c1c61..d0be032868 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -152,11 +152,7 @@ clusters = KMeans.train(parsedData, 2, maxIterations=10, runs=10, initializationMode="random") # Evaluate clustering by computing Within Set Sum of Squared Errors -def error(point): - center = clusters.centers[clusters.predict(point)] - return sqrt(sum([x**2 for x in (point - center)])) - -WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y) +WSSSE = clusters.computeCost(parsedData) print("Within Set Sum of Squared Error = " + str(WSSSE)) # Save and load model |