aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-clustering.md
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-05-18 17:00:57 -0700
committerMatei Zaharia <matei@databricks.com>2014-05-18 17:00:57 -0700
commitdf0aa8353ab6d3b19d838c6fa95a93a64948309f (patch)
tree96f19ed692c7a6578722be24c32bb0685d8d3e6b /docs/mllib-clustering.md
parent4ce479324bdcf603806fc90b5b0f4968c6de690e (diff)
downloadspark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.tar.gz
spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.tar.bz2
spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.zip
[WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0
Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng <meng@databricks.com> Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change
Diffstat (limited to 'docs/mllib-clustering.md')
-rw-r--r--docs/mllib-clustering.md5
1 files changed, 3 insertions, 2 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 276868fa84..429cdf8d40 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -1,6 +1,7 @@
---
layout: global
-title: <a href="mllib-guide.html">MLlib</a> - Clustering
+title: Clustering - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Clustering
---
* Table of contents
@@ -40,7 +41,7 @@ a given dataset, the algorithm returns the best clustering result).
Following code snippets can be executed in `spark-shell`.
In the following example after loading and parsing data, we use the
-[`KMeans`](api/mllib/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data
+[`KMeans`](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data
into two clusters. The number of desired clusters is passed to the algorithm. We then compute Within
Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the
optimal *k* is usually one where there is an "elbow" in the WSSSE graph.