[SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib

In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md This JIRA is just for spark.mllib, not spark.ml. Please let me know if more work is needed, thanks a lot. Author: Xin Ren <iamshrek@126.com> Closes #8977 from keypointt/SPARK-10669.
author: Xin Ren <iamshrek@126.com> 2015-10-07 15:00:19 +0100
committer: Sean Owen <sowen@cloudera.com> 2015-10-07 15:00:19 +0100
commit: 27cdde2ff87346fb54318532a476bf85f5837da7 (patch)
tree: a03cd037bae9a3bec8d13bfc43d33a82eeb6454b /docs/mllib-clustering.md
parent: ffe6831e49e28eb855f857fdfa5dd99341e80c9d (diff)
download: spark-27cdde2ff87346fb54318532a476bf85f5837da7.tar.gz
spark-27cdde2ff87346fb54318532a476bf85f5837da7.tar.bz2
spark-27cdde2ff87346fb54318532a476bf85f5837da7.zip
1 files changed, 28 insertions, 2 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index c2711cf82d..8fbced6c87 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -4,10 +4,10 @@ title: Clustering - MLlib
 displayTitle: <a href="mllib-guide.html">MLlib</a> - Clustering
 ---
 
-Clustering is an unsupervised learning problem whereby we aim to group subsets
+[Clustering](https://en.wikipedia.org/wiki/Cluster_analysis) is an unsupervised learning problem whereby we aim to group subsets
 of entities with one another based on some notion of similarity.  Clustering is
 often used for exploratory analysis and/or as a component of a hierarchical
-supervised learning pipeline (in which distinct classifiers or regression
+[supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) pipeline (in which distinct classifiers or regression
 models are trained for each cluster).
 
 MLlib supports the following models:
@@ -47,6 +47,8 @@ into two clusters. The number of desired clusters is passed to the algorithm. We
 Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the
 optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
 
+Refer to the [`KMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) and [`KMeansModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeansModel) for details on the API.
+
 {% highlight scala %}
 import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
 import org.apache.spark.mllib.linalg.Vectors
@@ -77,6 +79,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
 calling `.rdd()` on your `JavaRDD` object. A self-contained application example
 that is equivalent to the provided example in Scala is given below:
 
+Refer to the [`KMeans` Java docs](api/java/org/apache/spark/mllib/clustering/KMeans.html) and [`KMeansModel` Java docs](api/java/org/apache/spark/mllib/clustering/KMeansModel.html) for details on the API.
+
 {% highlight java %}
 import org.apache.spark.api.java.*;
 import org.apache.spark.api.java.function.Function;
@@ -132,6 +136,8 @@ data into two clusters. The number of desired clusters is passed to the algorith
 Within Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In
 fact the optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
 
+Refer to the [`KMeans` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeans) and [`KMeansModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeansModel) for more details on the API.
+
 {% highlight python %}
 from pyspark.mllib.clustering import KMeans, KMeansModel
 from numpy import array
@@ -184,6 +190,8 @@ In the following example after loading and parsing data, we use a
 object to cluster the data into two clusters. The number of desired clusters is passed
 to the algorithm. We then output the parameters of the mixture model.
 
+Refer to the [`GaussianMixture` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixture) and [`GaussianMixtureModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixtureModel) for details on the API.
+
 {% highlight scala %}
 import org.apache.spark.mllib.clustering.GaussianMixture
 import org.apache.spark.mllib.clustering.GaussianMixtureModel
@@ -216,6 +224,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
 calling `.rdd()` on your `JavaRDD` object. A self-contained application example
 that is equivalent to the provided example in Scala is given below:
 
+Refer to the [`GaussianMixture` Java docs](api/java/org/apache/spark/mllib/clustering/GaussianMixture.html) and [`GaussianMixtureModel` Java docs](api/java/org/apache/spark/mllib/clustering/GaussianMixtureModel.html) for details on the API.
+
 {% highlight java %}
 import org.apache.spark.api.java.*;
 import org.apache.spark.api.java.function.Function;
@@ -268,6 +278,8 @@ In the following example after loading and parsing data, we use a
 object to cluster the data into two clusters. The number of desired clusters is passed
 to the algorithm. We then output the parameters of the mixture model.
 
+Refer to the [`GaussianMixture` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixture) and [`GaussianMixtureModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixtureModel) for more details on the API.
+
 {% highlight python %}
 from pyspark.mllib.clustering import GaussianMixture
 from numpy import array
@@ -324,6 +336,8 @@ Calling `PowerIterationClustering.run` returns a
 [`PowerIterationClusteringModel`](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel),
 which contains the computed clustering assignments.
 
+Refer to the [`PowerIterationClustering` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering) and [`PowerIterationClusteringModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel) for details on the API.
+
 {% highlight scala %}
 import org.apache.spark.mllib.clustering.{PowerIterationClustering, PowerIterationClusteringModel}
 import org.apache.spark.mllib.linalg.Vectors
@@ -365,6 +379,8 @@ Calling `PowerIterationClustering.run` returns a
 [`PowerIterationClusteringModel`](api/java/org/apache/spark/mllib/clustering/PowerIterationClusteringModel.html)
 which contains the computed clustering assignments.
 
+Refer to the [`PowerIterationClustering` Java docs](api/java/org/apache/spark/mllib/clustering/PowerIterationClustering.html) and [`PowerIterationClusteringModel` Java docs](api/java/org/apache/spark/mllib/clustering/PowerIterationClusteringModel.html) for details on the API.
+
 {% highlight java %}
 import scala.Tuple2;
 import scala.Tuple3;
@@ -411,6 +427,8 @@ Calling `PowerIterationClustering.run` returns a
 [`PowerIterationClusteringModel`](api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClustering),
 which contains the computed clustering assignments.
 
+Refer to the [`PowerIterationClustering` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClustering) and [`PowerIterationClusteringModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClusteringModel) for more details on the API.
+
 {% highlight python %}
 from __future__ import print_function
 from pyspark.mllib.clustering import PowerIterationClustering, PowerIterationClusteringModel
@@ -571,6 +589,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 
 <div class="codetabs">
 <div data-lang="scala" markdown="1">
+Refer to the [`LDA` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) and [`DistributedLDAModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.DistributedLDAModel) for details on the API.
 
 {% highlight scala %}
 import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
@@ -602,6 +621,8 @@ val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
 </div>
 
 <div data-lang="java" markdown="1">
+Refer to the [`LDA` Java docs](api/java/org/apache/spark/mllib/clustering/LDA.html) and [`DistributedLDAModel` Java docs](api/java/org/apache/spark/mllib/clustering/DistributedLDAModel.html) for details on the API.
+
 {% highlight java %}
 import scala.Tuple2;
 
@@ -666,6 +687,8 @@ public class JavaLDAExample {
 </div>
 
 <div data-lang="python" markdown="1">
+Refer to the [`LDA` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDA) and [`LDAModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDAModel) for more details on the API.
+
 {% highlight python %}
 from pyspark.mllib.clustering import LDA, LDAModel
 from pyspark.mllib.linalg import Vectors
@@ -730,6 +753,7 @@ This example shows how to estimate clusters on streaming data.
 <div class="codetabs">
 
 <div data-lang="scala" markdown="1">
+Refer to the [`StreamingKMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.StreamingKMeans) for details on the API.
 
 First we import the neccessary classes.
 
@@ -780,6 +804,8 @@ ssc.awaitTermination()
 </div>
 
 <div data-lang="python" markdown="1">
+Refer to the [`StreamingKMeans` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.StreamingKMeans) for more details on the API.
+
 First we import the neccessary classes.
 
 {% highlight python %}
author	Xin Ren <iamshrek@126.com>	2015-10-07 15:00:19 +0100
committer	Sean Owen <sowen@cloudera.com>	2015-10-07 15:00:19 +0100
commit	27cdde2ff87346fb54318532a476bf85f5837da7 (patch)
tree	a03cd037bae9a3bec8d13bfc43d33a82eeb6454b /docs/mllib-clustering.md
parent	ffe6831e49e28eb855f857fdfa5dd99341e80c9d (diff)
download	spark-27cdde2ff87346fb54318532a476bf85f5837da7.tar.gz spark-27cdde2ff87346fb54318532a476bf85f5837da7.tar.bz2 spark-27cdde2ff87346fb54318532a476bf85f5837da7.zip