diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-05-18 17:00:57 -0700 |
---|---|---|
committer | Matei Zaharia <matei@databricks.com> | 2014-05-18 17:00:57 -0700 |
commit | df0aa8353ab6d3b19d838c6fa95a93a64948309f (patch) | |
tree | 96f19ed692c7a6578722be24c32bb0685d8d3e6b /docs/mllib-clustering.md | |
parent | 4ce479324bdcf603806fc90b5b0f4968c6de690e (diff) | |
download | spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.tar.gz spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.tar.bz2 spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.zip |
[WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0
Some improvements to MLlib guide:
1. [SPARK-1872] Update API links for unidoc.
2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display.
3. Add more Java/Python examples.
Author: Xiangrui Meng <meng@databricks.com>
Closes #816 from mengxr/mllib-doc and squashes the following commits:
ec2e407 [Xiangrui Meng] format scala example for ALS
cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
7dad18e [Xiangrui Meng] update java api links in NB
3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
e4afaa8 [Xiangrui Meng] explicity state what might change
Diffstat (limited to 'docs/mllib-clustering.md')
-rw-r--r-- | docs/mllib-clustering.md | 5 |
1 files changed, 3 insertions, 2 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 276868fa84..429cdf8d40 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -1,6 +1,7 @@ --- layout: global -title: <a href="mllib-guide.html">MLlib</a> - Clustering +title: Clustering - MLlib +displayTitle: <a href="mllib-guide.html">MLlib</a> - Clustering --- * Table of contents @@ -40,7 +41,7 @@ a given dataset, the algorithm returns the best clustering result). Following code snippets can be executed in `spark-shell`. In the following example after loading and parsing data, we use the -[`KMeans`](api/mllib/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data +[`KMeans`](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data into two clusters. The number of desired clusters is passed to the algorithm. We then compute Within Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the optimal *k* is usually one where there is an "elbow" in the WSSSE graph. |