aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorXin Ren <iamshrek@126.com>2015-10-07 15:00:19 +0100
committerSean Owen <sowen@cloudera.com>2015-10-07 15:00:19 +0100
commit27cdde2ff87346fb54318532a476bf85f5837da7 (patch)
treea03cd037bae9a3bec8d13bfc43d33a82eeb6454b /docs
parentffe6831e49e28eb855f857fdfa5dd99341e80c9d (diff)
downloadspark-27cdde2ff87346fb54318532a476bf85f5837da7.tar.gz
spark-27cdde2ff87346fb54318532a476bf85f5837da7.tar.bz2
spark-27cdde2ff87346fb54318532a476bf85f5837da7.zip
[SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib
In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md This JIRA is just for spark.mllib, not spark.ml. Please let me know if more work is needed, thanks a lot. Author: Xin Ren <iamshrek@126.com> Closes #8977 from keypointt/SPARK-10669.
Diffstat (limited to 'docs')
-rw-r--r--docs/mllib-clustering.md30
-rw-r--r--docs/mllib-collaborative-filtering.md6
-rw-r--r--docs/mllib-data-types.md47
-rw-r--r--docs/mllib-decision-tree.md22
-rw-r--r--docs/mllib-dimensionality-reduction.md10
-rw-r--r--docs/mllib-ensembles.md44
-rw-r--r--docs/mllib-evaluation-metrics.md15
-rw-r--r--docs/mllib-feature-extraction.md47
-rw-r--r--docs/mllib-frequent-pattern-mining.md13
-rw-r--r--docs/mllib-isotonic-regression.md6
-rw-r--r--docs/mllib-linear-methods.md18
-rw-r--r--docs/mllib-naive-bayes.md6
-rw-r--r--docs/mllib-optimization.md4
-rw-r--r--docs/mllib-pmml-model-export.md2
-rw-r--r--docs/mllib-statistics.md34
15 files changed, 274 insertions, 30 deletions
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index c2711cf82d..8fbced6c87 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -4,10 +4,10 @@ title: Clustering - MLlib
displayTitle: <a href="mllib-guide.html">MLlib</a> - Clustering
---
-Clustering is an unsupervised learning problem whereby we aim to group subsets
+[Clustering](https://en.wikipedia.org/wiki/Cluster_analysis) is an unsupervised learning problem whereby we aim to group subsets
of entities with one another based on some notion of similarity. Clustering is
often used for exploratory analysis and/or as a component of a hierarchical
-supervised learning pipeline (in which distinct classifiers or regression
+[supervised learning](https://en.wikipedia.org/wiki/Supervised_learning) pipeline (in which distinct classifiers or regression
models are trained for each cluster).
MLlib supports the following models:
@@ -47,6 +47,8 @@ into two clusters. The number of desired clusters is passed to the algorithm. We
Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the
optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
+Refer to the [`KMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) and [`KMeansModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeansModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
@@ -77,6 +79,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
calling `.rdd()` on your `JavaRDD` object. A self-contained application example
that is equivalent to the provided example in Scala is given below:
+Refer to the [`KMeans` Java docs](api/java/org/apache/spark/mllib/clustering/KMeans.html) and [`KMeansModel` Java docs](api/java/org/apache/spark/mllib/clustering/KMeansModel.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.Function;
@@ -132,6 +136,8 @@ data into two clusters. The number of desired clusters is passed to the algorith
Within Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In
fact the optimal *k* is usually one where there is an "elbow" in the WSSSE graph.
+Refer to the [`KMeans` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeans) and [`KMeansModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeansModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.clustering import KMeans, KMeansModel
from numpy import array
@@ -184,6 +190,8 @@ In the following example after loading and parsing data, we use a
object to cluster the data into two clusters. The number of desired clusters is passed
to the algorithm. We then output the parameters of the mixture model.
+Refer to the [`GaussianMixture` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixture) and [`GaussianMixtureModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixtureModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.clustering.GaussianMixture
import org.apache.spark.mllib.clustering.GaussianMixtureModel
@@ -216,6 +224,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
calling `.rdd()` on your `JavaRDD` object. A self-contained application example
that is equivalent to the provided example in Scala is given below:
+Refer to the [`GaussianMixture` Java docs](api/java/org/apache/spark/mllib/clustering/GaussianMixture.html) and [`GaussianMixtureModel` Java docs](api/java/org/apache/spark/mllib/clustering/GaussianMixtureModel.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.Function;
@@ -268,6 +278,8 @@ In the following example after loading and parsing data, we use a
object to cluster the data into two clusters. The number of desired clusters is passed
to the algorithm. We then output the parameters of the mixture model.
+Refer to the [`GaussianMixture` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixture) and [`GaussianMixtureModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixtureModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.clustering import GaussianMixture
from numpy import array
@@ -324,6 +336,8 @@ Calling `PowerIterationClustering.run` returns a
[`PowerIterationClusteringModel`](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel),
which contains the computed clustering assignments.
+Refer to the [`PowerIterationClustering` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering) and [`PowerIterationClusteringModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.clustering.{PowerIterationClustering, PowerIterationClusteringModel}
import org.apache.spark.mllib.linalg.Vectors
@@ -365,6 +379,8 @@ Calling `PowerIterationClustering.run` returns a
[`PowerIterationClusteringModel`](api/java/org/apache/spark/mllib/clustering/PowerIterationClusteringModel.html)
which contains the computed clustering assignments.
+Refer to the [`PowerIterationClustering` Java docs](api/java/org/apache/spark/mllib/clustering/PowerIterationClustering.html) and [`PowerIterationClusteringModel` Java docs](api/java/org/apache/spark/mllib/clustering/PowerIterationClusteringModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
import scala.Tuple3;
@@ -411,6 +427,8 @@ Calling `PowerIterationClustering.run` returns a
[`PowerIterationClusteringModel`](api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClustering),
which contains the computed clustering assignments.
+Refer to the [`PowerIterationClustering` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClustering) and [`PowerIterationClusteringModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClusteringModel) for more details on the API.
+
{% highlight python %}
from __future__ import print_function
from pyspark.mllib.clustering import PowerIterationClustering, PowerIterationClusteringModel
@@ -571,6 +589,7 @@ to the algorithm. We then output the topics, represented as probability distribu
<div class="codetabs">
<div data-lang="scala" markdown="1">
+Refer to the [`LDA` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) and [`DistributedLDAModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.DistributedLDAModel) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
@@ -602,6 +621,8 @@ val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
</div>
<div data-lang="java" markdown="1">
+Refer to the [`LDA` Java docs](api/java/org/apache/spark/mllib/clustering/LDA.html) and [`DistributedLDAModel` Java docs](api/java/org/apache/spark/mllib/clustering/DistributedLDAModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
@@ -666,6 +687,8 @@ public class JavaLDAExample {
</div>
<div data-lang="python" markdown="1">
+Refer to the [`LDA` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDA) and [`LDAModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDAModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.clustering import LDA, LDAModel
from pyspark.mllib.linalg import Vectors
@@ -730,6 +753,7 @@ This example shows how to estimate clusters on streaming data.
<div class="codetabs">
<div data-lang="scala" markdown="1">
+Refer to the [`StreamingKMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.StreamingKMeans) for details on the API.
First we import the neccessary classes.
@@ -780,6 +804,8 @@ ssc.awaitTermination()
</div>
<div data-lang="python" markdown="1">
+Refer to the [`StreamingKMeans` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.clustering.StreamingKMeans) for more details on the API.
+
First we import the neccessary classes.
{% highlight python %}
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md
index eedc23424a..b3fd51dca5 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -64,6 +64,8 @@ We use the default [ALS.train()](api/scala/index.html#org.apache.spark.mllib.rec
method which assumes ratings are explicit. We evaluate the
recommendation model by measuring the Mean Squared Error of rating prediction.
+Refer to the [`ALS` Scala docs](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
@@ -119,6 +121,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
calling `.rdd()` on your `JavaRDD` object. A self-contained application example
that is equivalent to the provided example in Scala is given bellow:
+Refer to the [`ALS` Java docs](api/java/org/apache/spark/mllib/recommendation/ALS.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
@@ -201,6 +205,8 @@ In the following example we load rating data. Each row consists of a user, a pro
We use the default ALS.train() method which assumes ratings are explicit. We evaluate the
recommendation by measuring the Mean Squared Error of rating prediction.
+Refer to the [`ALS` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
diff --git a/docs/mllib-data-types.md b/docs/mllib-data-types.md
index d8c7bdc63c..3c0c047967 100644
--- a/docs/mllib-data-types.md
+++ b/docs/mllib-data-types.md
@@ -33,6 +33,8 @@ implementations: [`DenseVector`](api/scala/index.html#org.apache.spark.mllib.lin
using the factory methods implemented in
[`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) to create local vectors.
+Refer to the [`Vector` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.{Vector, Vectors}
@@ -59,6 +61,8 @@ implementations: [`DenseVector`](api/java/org/apache/spark/mllib/linalg/DenseVec
using the factory methods implemented in
[`Vectors`](api/java/org/apache/spark/mllib/linalg/Vectors.html) to create local vectors.
+Refer to the [`Vector` Java docs](api/java/org/apache/spark/mllib/linalg/Vector.html) and [`Vectors` Java docs](api/java/org/apache/spark/mllib/linalg/Vectors.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
@@ -86,6 +90,8 @@ and the following as sparse vectors:
We recommend using NumPy arrays over lists for efficiency, and using the factory methods implemented
in [`Vectors`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors) to create sparse vectors.
+Refer to the [`Vectors` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors) for more details on the API.
+
{% highlight python %}
import numpy as np
import scipy.sparse as sps
@@ -119,6 +125,8 @@ For multiclass classification, labels should be class indices starting from zero
A labeled point is represented by the case class
[`LabeledPoint`](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint).
+Refer to the [`LabeledPoint` Scala docs](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
@@ -136,6 +144,8 @@ val neg = LabeledPoint(0.0, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)))
A labeled point is represented by
[`LabeledPoint`](api/java/org/apache/spark/mllib/regression/LabeledPoint.html).
+Refer to the [`LabeledPoint` Java docs](api/java/org/apache/spark/mllib/regression/LabeledPoint.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
@@ -153,6 +163,8 @@ LabeledPoint neg = new LabeledPoint(0.0, Vectors.sparse(3, new int[] {0, 2}, new
A labeled point is represented by
[`LabeledPoint`](api/python/pyspark.mllib.html#pyspark.mllib.regression.LabeledPoint).
+Refer to the [`LabeledPoint` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.regression.LabeledPoint) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.linalg import SparseVector
from pyspark.mllib.regression import LabeledPoint
@@ -187,6 +199,8 @@ After loading, the feature indices are converted to zero-based.
[`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) reads training
examples stored in LIBSVM format.
+Refer to the [`MLUtils` Scala docs](api/scala/index.html#org.apache.spark.mllib.util.MLUtils) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
@@ -200,6 +214,8 @@ val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_
[`MLUtils.loadLibSVMFile`](api/java/org/apache/spark/mllib/util/MLUtils.html) reads training
examples stored in LIBSVM format.
+Refer to the [`MLUtils` Java docs](api/java/org/apache/spark/mllib/util/MLUtils.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.util.MLUtils;
@@ -214,6 +230,8 @@ JavaRDD<LabeledPoint> examples =
[`MLUtils.loadLibSVMFile`](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) reads training
examples stored in LIBSVM format.
+Refer to the [`MLUtils` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.util import MLUtils
@@ -246,6 +264,8 @@ We recommend using the factory methods implemented
in [`Matrices`](api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$) to create local
matrices. Remember, local matrices in MLlib are stored in column-major order.
+Refer to the [`Matrix` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Matrix) and [`Matrices` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Matrices) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.{Matrix, Matrices}
@@ -267,6 +287,8 @@ We recommend using the factory methods implemented
in [`Matrices`](api/java/org/apache/spark/mllib/linalg/Matrices.html) to create local
matrices. Remember, local matrices in MLlib are stored in column-major order.
+Refer to the [`Matrix` Java docs](api/java/org/apache/spark/mllib/linalg/Matrix.html) and [`Matrices` Java docs](api/java/org/apache/spark/mllib/linalg/Matrices.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.mllib.linalg.Matrix;
import org.apache.spark.mllib.linalg.Matrices;
@@ -289,6 +311,8 @@ We recommend using the factory methods implemented
in [`Matrices`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.Matrices) to create local
matrices. Remember, local matrices in MLlib are stored in column-major order.
+Refer to the [`Matrix` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.Matrix) and [`Matrices` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.Matrices) for more details on the API.
+
{% highlight python %}
import org.apache.spark.mllib.linalg.{Matrix, Matrices}
@@ -341,6 +365,7 @@ created from an `RDD[Vector]` instance. Then we can compute its column summary
[QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) is of the form A = QR where Q is an orthogonal matrix and R is an upper triangular matrix.
For [singular value decomposition (SVD)](https://en.wikipedia.org/wiki/Singular_value_decomposition) and [principal component analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis), please refer to [Dimensionality reduction](mllib-dimensionality-reduction.html).
+Refer to the [`RowMatrix` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.linalg.Vector
@@ -364,6 +389,8 @@ val qrResult = mat.tallSkinnyQR(true)
A [`RowMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html) can be
created from a `JavaRDD<Vector>` instance. Then we can compute its column summary statistics.
+Refer to the [`RowMatrix` Java docs](api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.Vector;
@@ -387,6 +414,8 @@ QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
A [`RowMatrix`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.RowMatrix) can be
created from an `RDD` of vectors.
+Refer to the [`RowMatrix` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.RowMatrix) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.linalg.distributed import RowMatrix
@@ -423,6 +452,8 @@ can be created from an `RDD[IndexedRow]` instance, where
wrapper over `(Long, Vector)`. An `IndexedRowMatrix` can be converted to a `RowMatrix` by dropping
its row indices.
+Refer to the [`IndexedRowMatrix` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix, RowMatrix}
@@ -448,6 +479,8 @@ can be created from an `JavaRDD<IndexedRow>` instance, where
wrapper over `(long, Vector)`. An `IndexedRowMatrix` can be converted to a `RowMatrix` by dropping
its row indices.
+Refer to the [`IndexedRowMatrix` Java docs](api/java/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.distributed.IndexedRow;
@@ -475,6 +508,8 @@ can be created from an `RDD` of `IndexedRow`s, where
wrapper over `(long, vector)`. An `IndexedRowMatrix` can be converted to a `RowMatrix` by dropping
its row indices.
+Refer to the [`IndexedRowMatrix` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.IndexedRowMatrix) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix
@@ -529,6 +564,8 @@ wrapper over `(Long, Long, Double)`. A `CoordinateMatrix` can be converted to a
with sparse rows by calling `toIndexedRowMatrix`. Other computations for
`CoordinateMatrix` are not currently supported.
+Refer to the [`CoordinateMatrix` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.CoordinateMatrix) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}
@@ -555,6 +592,8 @@ wrapper over `(long, long, double)`. A `CoordinateMatrix` can be converted to a
with sparse rows by calling `toIndexedRowMatrix`. Other computations for
`CoordinateMatrix` are not currently supported.
+Refer to the [`CoordinateMatrix` Java docs](api/java/org/apache/spark/mllib/linalg/distributed/CoordinateMatrix.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
@@ -582,6 +621,8 @@ can be created from an `RDD` of `MatrixEntry` entries, where
wrapper over `(long, long, float)`. A `CoordinateMatrix` can be converted to a `RowMatrix` by
calling `toRowMatrix`, or to an `IndexedRowMatrix` with sparse rows by calling `toIndexedRowMatrix`.
+Refer to the [`CoordinateMatrix` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.CoordinateMatrix) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.linalg.distributed import CoordinateMatrix, MatrixEntry
@@ -631,6 +672,8 @@ most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling
`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
+Refer to the [`BlockMatrix` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry}
@@ -656,6 +699,8 @@ most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling
`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
+Refer to the [`BlockMatrix` Java docs](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
@@ -683,6 +728,8 @@ A [`BlockMatrix`](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed
can be created from an `RDD` of sub-matrix blocks, where a sub-matrix block is a
`((blockRowIndex, blockColIndex), sub-matrix)` tuple.
+Refer to the [`BlockMatrix` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.BlockMatrix) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.linalg import Matrices
from pyspark.mllib.linalg.distributed import BlockMatrix
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index c1d0f8a6b1..f31c4f8893 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -191,7 +191,9 @@ maximum tree depth of 5. The test error is calculated to measure the algorithm a
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`DecisionTree` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree) and [`DecisionTreeModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.model.DecisionTreeModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.tree.DecisionTree
import org.apache.spark.mllib.tree.model.DecisionTreeModel
@@ -229,7 +231,9 @@ val sameModel = DecisionTreeModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`DecisionTree` Java docs](api/java/org/apache/spark/mllib/tree/DecisionTree.html) and [`DecisionTreeModel` Java docs](api/java/org/apache/spark/mllib/tree/model/DecisionTreeModel.html) for details on the API.
+
{% highlight java %}
import java.util.HashMap;
import scala.Tuple2;
@@ -291,7 +295,8 @@ DecisionTreeModel sameModel = DecisionTreeModel.load(sc.sc(), "myModelPath");
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`DecisionTree` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTree) and [`DecisionTreeModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTreeModel) for more details on the API.
{% highlight python %}
from pyspark.mllib.regression import LabeledPoint
@@ -335,7 +340,9 @@ depth of 5. The Mean Squared Error (MSE) is computed at the end to evaluate
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`DecisionTree` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree) and [`DecisionTreeModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.model.DecisionTreeModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.tree.DecisionTree
import org.apache.spark.mllib.tree.model.DecisionTreeModel
@@ -372,7 +379,9 @@ val sameModel = DecisionTreeModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`DecisionTree` Java docs](api/java/org/apache/spark/mllib/tree/DecisionTree.html) and [`DecisionTreeModel` Java docs](api/java/org/apache/spark/mllib/tree/model/DecisionTreeModel.html) for details on the API.
+
{% highlight java %}
import java.util.HashMap;
import scala.Tuple2;
@@ -440,7 +449,8 @@ DecisionTreeModel sameModel = DecisionTreeModel.load(sc.sc(), "myModelPath");
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`DecisionTree` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTree) and [`DecisionTreeModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTreeModel) for more details on the API.
{% highlight python %}
from pyspark.mllib.regression import LabeledPoint
diff --git a/docs/mllib-dimensionality-reduction.md b/docs/mllib-dimensionality-reduction.md
index 05f51168d8..ac3526908a 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -62,6 +62,8 @@ MLlib provides SVD functionality to row-oriented matrices, provided in the
<div class="codetabs">
<div data-lang="scala" markdown="1">
+Refer to the [`SingularValueDecomposition` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.SingularValueDecomposition) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.Matrix
import org.apache.spark.mllib.linalg.distributed.RowMatrix
@@ -80,6 +82,8 @@ The same code applies to `IndexedRowMatrix` if `U` is defined as an
`IndexedRowMatrix`.
</div>
<div data-lang="java" markdown="1">
+Refer to the [`SingularValueDecomposition` Java docs](api/java/org/apache/spark/mllib/linalg/SingularValueDecomposition.html) for details on the API.
+
{% highlight java %}
import java.util.LinkedList;
@@ -145,6 +149,8 @@ MLlib supports PCA for tall-and-skinny matrices stored in row-oriented format an
The following code demonstrates how to compute principal components on a `RowMatrix`
and use them to project the vectors into a low-dimensional space.
+Refer to the [`RowMatrix` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.Matrix
import org.apache.spark.mllib.linalg.distributed.RowMatrix
@@ -161,6 +167,8 @@ val projected: RowMatrix = mat.multiply(pc)
The following code demonstrates how to compute principal components on source vectors
and use them to project the vectors into a low-dimensional space while keeping associated labels:
+Refer to the [`PCA` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.PCA) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.feature.PCA
@@ -182,6 +190,8 @@ The following code demonstrates how to compute principal components on a `RowMat
and use them to project the vectors into a low-dimensional space.
The number of columns should be small, e.g, less than 1000.
+Refer to the [`RowMatrix` Java docs](api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html) for details on the API.
+
{% highlight java %}
import java.util.LinkedList;
diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
index 1e00b2083e..fc587298f7 100644
--- a/docs/mllib-ensembles.md
+++ b/docs/mllib-ensembles.md
@@ -95,7 +95,9 @@ The test error is calculated to measure the algorithm accuracy.
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`RandomForest` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.RandomForest) and [`RandomForestModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.model.RandomForestModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.tree.RandomForest
import org.apache.spark.mllib.tree.model.RandomForestModel
@@ -135,7 +137,9 @@ val sameModel = RandomForestModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`RandomForest` Java docs](api/java/org/apache/spark/mllib/tree/RandomForest.html) and [`RandomForestModel` Java docs](api/java/org/apache/spark/mllib/tree/model/RandomForestModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
import java.util.HashMap;
@@ -200,7 +204,8 @@ RandomForestModel sameModel = RandomForestModel.load(sc.sc(), "myModelPath");
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`RandomForest` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForest) and [`RandomForest` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForestModel) for more details on the API.
{% highlight python %}
from pyspark.mllib.tree import RandomForest, RandomForestModel
@@ -246,7 +251,9 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`RandomForest` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.RandomForest) and [`RandomForestModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.model.RandomForestModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.tree.RandomForest
import org.apache.spark.mllib.tree.model.RandomForestModel
@@ -286,7 +293,9 @@ val sameModel = RandomForestModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`RandomForest` Java docs](api/java/org/apache/spark/mllib/tree/RandomForest.html) and [`RandomForestModel` Java docs](api/java/org/apache/spark/mllib/tree/model/RandomForestModel.html) for details on the API.
+
{% highlight java %}
import java.util.HashMap;
import scala.Tuple2;
@@ -354,7 +363,8 @@ RandomForestModel sameModel = RandomForestModel.load(sc.sc(), "myModelPath");
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`RandomForest` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForest) and [`RandomForest` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForestModel) for more details on the API.
{% highlight python %}
from pyspark.mllib.tree import RandomForest, RandomForestModel
@@ -479,7 +489,9 @@ The test error is calculated to measure the algorithm accuracy.
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`GradientBoostedTrees` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.GradientBoostedTrees) and [`GradientBoostedTreesModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.tree.GradientBoostedTrees
import org.apache.spark.mllib.tree.configuration.BoostingStrategy
@@ -518,7 +530,9 @@ val sameModel = GradientBoostedTreesModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`GradientBoostedTrees` Java docs](api/java/org/apache/spark/mllib/tree/GradientBoostedTrees.html) and [`GradientBoostedTreesModel` Java docs](api/java/org/apache/spark/mllib/tree/model/GradientBoostedTreesModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
import java.util.HashMap;
@@ -583,7 +597,8 @@ GradientBoostedTreesModel sameModel = GradientBoostedTreesModel.load(sc.sc(), "m
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`GradientBoostedTrees` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTrees) and [`GradientBoostedTreesModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTreesModel) for more details on the API.
{% highlight python %}
from pyspark.mllib.tree import GradientBoostedTrees, GradientBoostedTreesModel
@@ -627,7 +642,9 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`GradientBoostedTrees` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.GradientBoostedTrees) and [`GradientBoostedTreesModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.tree.GradientBoostedTrees
import org.apache.spark.mllib.tree.configuration.BoostingStrategy
@@ -665,7 +682,9 @@ val sameModel = GradientBoostedTreesModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`GradientBoostedTrees` Java docs](api/java/org/apache/spark/mllib/tree/GradientBoostedTrees.html) and [`GradientBoostedTreesModel` Java docs](api/java/org/apache/spark/mllib/tree/model/GradientBoostedTreesModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
import java.util.HashMap;
@@ -736,7 +755,8 @@ GradientBoostedTreesModel sameModel = GradientBoostedTreesModel.load(sc.sc(), "m
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`GradientBoostedTrees` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTrees) and [`GradientBoostedTreesModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTreesModel) for more details on the API.
{% highlight python %}
from pyspark.mllib.tree import GradientBoostedTrees, GradientBoostedTreesModel
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index 7066d5c974..2270f7a34b 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -102,6 +102,7 @@ The following code snippets illustrate how to load a sample dataset, train a bin
data, and evaluate the performance of the algorithm by several binary evaluation metrics.
<div data-lang="scala" markdown="1">
+Refer to the [`LogisticRegressionWithLBFGS` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS) and [`BinaryClassificationMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
@@ -179,6 +180,7 @@ println("Area under ROC = " + auROC)
</div>
<div data-lang="java" markdown="1">
+Refer to the [`LogisticRegressionModel` Java docs](api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html) and [`LogisticRegressionWithLBFGS` Java docs](api/java/org/apache/spark/mllib/classification/LogisticRegressionWithLBFGS.html) for details on the API.
{% highlight java %}
import scala.Tuple2;
@@ -276,6 +278,7 @@ public class BinaryClassification {
</div>
<div data-lang="python" markdown="1">
+Refer to the [`BinaryClassificationMetrics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.evaluation.BinaryClassificationMetrics) and [`LogisticRegressionWithLBFGS` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionWithLBFGS) for more details on the API.
{% highlight python %}
from pyspark.mllib.classification import LogisticRegressionWithLBFGS
@@ -428,6 +431,7 @@ The following code snippets illustrate how to load a sample dataset, train a mul
the data, and evaluate the performance of the algorithm by several multiclass classification evaluation metrics.
<div data-lang="scala" markdown="1">
+Refer to the [`MulticlassMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.MulticlassMetrics) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
@@ -501,6 +505,7 @@ println(s"Weighted false positive rate: ${metrics.weightedFalsePositiveRate}")
</div>
<div data-lang="java" markdown="1">
+Refer to the [`MulticlassMetrics` Java docs](api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html) for details on the API.
{% highlight java %}
import scala.Tuple2;
@@ -580,6 +585,7 @@ public class MulticlassClassification {
</div>
<div data-lang="python" markdown="1">
+Refer to the [`MulticlassMetrics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.evaluation.MulticlassMetrics) for more details on the API.
{% highlight python %}
from pyspark.mllib.classification import LogisticRegressionWithLBFGS
@@ -758,6 +764,7 @@ True classes:
<div class="codetabs">
<div data-lang="scala" markdown="1">
+Refer to the [`MultilabelMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.MultilabelMetrics) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.evaluation.MultilabelMetrics
@@ -802,6 +809,7 @@ println(s"Subset accuracy = ${metrics.subsetAccuracy}")
</div>
<div data-lang="java" markdown="1">
+Refer to the [`MultilabelMetrics` Java docs](api/java/org/apache/spark/mllib/evaluation/MultilabelMetrics.html) for details on the API.
{% highlight java %}
import scala.Tuple2;
@@ -864,6 +872,7 @@ public class MultilabelClassification {
</div>
<div data-lang="python" markdown="1">
+Refer to the [`MultilabelMetrics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.evaluation.MultilabelMetrics) for more details on the API.
{% highlight python %}
from pyspark.mllib.evaluation import MultilabelMetrics
@@ -1016,6 +1025,7 @@ expanded world of non-positive weights are "the same as never having interacted
<div class="codetabs">
<div data-lang="scala" markdown="1">
+Refer to the [`RegressionMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.RegressionMetrics) and [`RankingMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.RankingMetrics) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.evaluation.{RegressionMetrics, RankingMetrics}
@@ -1095,6 +1105,7 @@ println(s"R-squared = ${regressionMetrics.r2}")
</div>
<div data-lang="java" markdown="1">
+Refer to the [`RegressionMetrics` Java docs](api/java/org/apache/spark/mllib/evaluation/RegressionMetrics.html) and [`RankingMetrics` Java docs](api/java/org/apache/spark/mllib/evaluation/RankingMetrics.html) for details on the API.
{% highlight java %}
import scala.Tuple2;
@@ -1256,6 +1267,7 @@ public class Ranking {
</div>
<div data-lang="python" markdown="1">
+Refer to the [`RegressionMetrics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RegressionMetrics) and [`RankingMetrics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RankingMetrics) for more details on the API.
{% highlight python %}
from pyspark.mllib.recommendation import ALS, Rating
@@ -1336,6 +1348,7 @@ The following code snippets illustrate how to load a sample dataset, train a lin
and evaluate the performance of the algorithm by several regression metrics.
<div data-lang="scala" markdown="1">
+Refer to the [`RegressionMetrics` Scala docs](api/scala/index.html#org.apache.spark.mllib.evaluation.RegressionMetrics) for details on the API.
{% highlight scala %}
import org.apache.spark.mllib.regression.LabeledPoint
@@ -1379,6 +1392,7 @@ println(s"Explained variance = ${metrics.explainedVariance}")
</div>
<div data-lang="java" markdown="1">
+Refer to the [`RegressionMetrics` Java docs](api/java/org/apache/spark/mllib/evaluation/RegressionMetrics.html) for details on the API.
{% highlight java %}
import scala.Tuple2;
@@ -1455,6 +1469,7 @@ public class LinearRegression {
</div>
<div data-lang="python" markdown="1">
+Refer to the [`RegressionMetrics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RegressionMetrics) for more details on the API.
{% highlight python %}
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD
diff --git a/docs/mllib-feature-extraction.md b/docs/mllib-feature-extraction.md
index 7e417ed5f3..5bee170c61 100644
--- a/docs/mllib-feature-extraction.md
+++ b/docs/mllib-feature-extraction.md
@@ -56,6 +56,9 @@ and [IDF](api/scala/index.html#org.apache.spark.mllib.feature.IDF).
`HashingTF` takes an `RDD[Iterable[_]]` as the input.
Each record could be an iterable of strings or other types.
+Refer to the [`HashingTF` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.HashingTF) for details on the API.
+
+
{% highlight scala %}
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
@@ -103,6 +106,9 @@ and [IDF](api/python/pyspark.mllib.html#pyspark.mllib.feature.IDF).
`HashingTF` takes an RDD of list as the input.
Each record could be an iterable of strings or other types.
+
+Refer to the [`HashingTF` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.feature.HashingTF) for details on the API.
+
{% highlight python %}
from pyspark import SparkContext
from pyspark.mllib.feature import HashingTF
@@ -183,7 +189,9 @@ the [text8](http://mattmahoney.net/dc/text8.zip) data and extract it to your pre
Here we assume the extracted file is `text8` and in same directory as you run the spark shell.
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`Word2Vec` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.Word2Vec) for details on the API.
+
{% highlight scala %}
import org.apache.spark._
import org.apache.spark.rdd._
@@ -207,7 +215,9 @@ model.save(sc, "myModelPath")
val sameModel = Word2VecModel.load(sc, "myModelPath")
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`Word2Vec` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.feature.Word2Vec) for more details on the API.
+
{% highlight python %}
from pyspark import SparkContext
from pyspark.mllib.feature import Word2Vec
@@ -264,7 +274,9 @@ The example below demonstrates how to load a dataset in libsvm format, and stand
so that the new features have unit standard deviation and/or zero mean.
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`StandardScaler` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.StandardScaler) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.feature.StandardScaler
@@ -288,7 +300,9 @@ val data2 = data.map(x => (x.label, scaler2.transform(Vectors.dense(x.features.t
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`StandardScaler` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.feature.StandardScaler) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.util import MLUtils
from pyspark.mllib.linalg import Vectors
@@ -338,7 +352,9 @@ The example below demonstrates how to load a dataset in libsvm format, and norma
with $L^2$ norm, and $L^\infty$ norm.
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`Normalizer` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.Normalizer) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.feature.Normalizer
@@ -358,7 +374,9 @@ val data2 = data.map(x => (x.label, normalizer2.transform(x.features)))
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`Normalizer` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.feature.Normalizer) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.util import MLUtils
from pyspark.mllib.linalg import Vectors
@@ -532,7 +550,10 @@ v_N
This example below demonstrates how to transform vectors using a transforming vector value.
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+
+Refer to the [`ElementwiseProduct` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.ElementwiseProduct) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext._
import org.apache.spark.mllib.feature.ElementwiseProduct
@@ -551,7 +572,9 @@ val transformedData2 = data.map(x => transformer.transform(x))
{% endhighlight %}
</div>
-<div data-lang="java">
+<div data-lang="java" markdown="1">
+Refer to the [`ElementwiseProduct` Java docs](api/java/org/apache/spark/mllib/feature/ElementwiseProduct.html) for details on the API.
+
{% highlight java %}
import java.util.Arrays;
import org.apache.spark.api.java.JavaRDD;
@@ -580,7 +603,9 @@ JavaRDD<Vector> transformedData2 = data.map(
{% endhighlight %}
</div>
-<div data-lang="python">
+<div data-lang="python" markdown="1">
+Refer to the [`ElementwiseProduct` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.feature.ElementwiseProduct) for more details on the API.
+
{% highlight python %}
from pyspark import SparkContext
from pyspark.mllib.linalg import Vectors
@@ -617,7 +642,9 @@ and use them to project the vectors into a low-dimensional space while keeping a
for calculation a [Linear Regression]((mllib-linear-methods.html))
<div class="codetabs">
-<div data-lang="scala">
+<div data-lang="scala" markdown="1">
+Refer to the [`PCA` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.PCA) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
diff --git a/docs/mllib-frequent-pattern-mining.md b/docs/mllib-frequent-pattern-mining.md
index 4d4f5cfdc5..f749eb4f2f 100644
--- a/docs/mllib-frequent-pattern-mining.md
+++ b/docs/mllib-frequent-pattern-mining.md
@@ -50,6 +50,7 @@ example illustrates how to mine frequent itemsets and association rules
Rules](mllib-frequent-pattern-mining.html#association-rules) for
details) from `transactions`.
+Refer to the [`FPGrowth` Scala docs](api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowth) for details on the API.
{% highlight scala %}
import org.apache.spark.rdd.RDD
@@ -92,6 +93,8 @@ example illustrates how to mine frequent itemsets and association rules
Rules](mllib-frequent-pattern-mining.html#association-rules) for
details) from `transactions`.
+Refer to the [`FPGrowth` Java docs](api/java/org/apache/spark/mllib/fpm/FPGrowth.html) for details on the API.
+
{% highlight java %}
import java.util.Arrays;
import java.util.List;
@@ -144,6 +147,8 @@ Calling `FPGrowth.train` with transactions returns an
[`FPGrowthModel`](api/python/pyspark.mllib.html#pyspark.mllib.fpm.FPGrowthModel)
that stores the frequent itemsets with their frequencies.
+Refer to the [`FPGrowth` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.fpm.FPGrowth) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.fpm import FPGrowth
@@ -170,6 +175,8 @@ for fi in result:
implements a parallel rule generation algorithm for constructing rules
that have a single item as the consequent.
+Refer to the [`AssociationRules` Scala docs](api/java/org/apache/spark/mllib/fpm/AssociationRules.html) for details on the API.
+
{% highlight scala %}
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.fpm.AssociationRules
@@ -199,6 +206,8 @@ results.collect().foreach { rule =>
implements a parallel rule generation algorithm for constructing rules
that have a single item as the consequent.
+Refer to the [`AssociationRules` Java docs](api/java/org/apache/spark/mllib/fpm/AssociationRules.html) for details on the API.
+
{% highlight java %}
import java.util.Arrays;
@@ -267,6 +276,8 @@ Calling `PrefixSpan.run` returns a
[`PrefixSpanModel`](api/scala/index.html#org.apache.spark.mllib.fpm.PrefixSpanModel)
that stores the frequent sequences with their frequencies.
+Refer to the [`PrefixSpan` Scala docs](api/scala/index.html#org.apache.spark.mllib.fpm.PrefixSpan) and [`PrefixSpanModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.fpm.PrefixSpanModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.fpm.PrefixSpan
@@ -296,6 +307,8 @@ Calling `PrefixSpan.run` returns a
[`PrefixSpanModel`](api/java/org/apache/spark/mllib/fpm/PrefixSpanModel.html)
that stores the frequent sequences with their frequencies.
+Refer to the [`PrefixSpan` Java docs](api/java/org/apache/spark/mllib/fpm/PrefixSpan.html) and [`PrefixSpanModel` Java docs](api/java/org/apache/spark/mllib/fpm/PrefixSpanModel.html) for details on the API.
+
{% highlight java %}
import java.util.Arrays;
import java.util.List;
diff --git a/docs/mllib-isotonic-regression.md b/docs/mllib-isotonic-regression.md
index 6aa881f749..f91a697b31 100644
--- a/docs/mllib-isotonic-regression.md
+++ b/docs/mllib-isotonic-regression.md
@@ -59,6 +59,8 @@ i.e. 4710.28,500.00. The data are split to training and testing set.
Model is created using the training set and a mean squared error is calculated from the predicted
labels and real labels in the test set.
+Refer to the [`IsotonicRegression` Scala docs](api/scala/index.html#org.apache.spark.mllib.regression.IsotonicRegression) and [`IsotonicRegressionModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.regression.IsotonicRegressionModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.regression.{IsotonicRegression, IsotonicRegressionModel}
@@ -101,6 +103,8 @@ i.e. 4710.28,500.00. The data are split to training and testing set.
Model is created using the training set and a mean squared error is calculated from the predicted
labels and real labels in the test set.
+Refer to the [`IsotonicRegression` Java docs](api/java/org/apache/spark/mllib/regression/IsotonicRegression.html) and [`IsotonicRegressionModel` Java docs](api/java/org/apache/spark/mllib/regression/IsotonicRegressionModel.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaDoubleRDD;
@@ -167,6 +171,8 @@ i.e. 4710.28,500.00. The data are split to training and testing set.
Model is created using the training set and a mean squared error is calculated from the predicted
labels and real labels in the test set.
+Refer to the [`IsotonicRegression` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.regression.IsotonicRegression) and [`IsotonicRegressionModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.regression.IsotonicRegressionModel) for more details on the API.
+
{% highlight python %}
import math
from pyspark.mllib.regression import IsotonicRegression, IsotonicRegressionModel
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index e9b2d276cd..a3e1620c77 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -165,6 +165,8 @@ training algorithm on this training data using a static method in the algorithm
object, and make predictions with the resulting model to compute the training
error.
+Refer to the [`SVMWithSGD` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.SVMWithSGD) and [`SVMModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.SVMModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
@@ -230,6 +232,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
calling `.rdd()` on your `JavaRDD` object. A self-contained application example
that is equivalent to the provided example in Scala is given bellow:
+Refer to the [`SVMWithSGD` Java docs](api/java/org/apache/spark/mllib/classification/SVMWithSGD.html) and [`SVMModel` Java docs](api/java/org/apache/spark/mllib/classification/SVMModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
@@ -316,6 +320,8 @@ a dependency.
The following example shows how to load a sample dataset, build SVM model,
and make predictions with the resulting model to compute the training error.
+Refer to the [`SVMWithSGD` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.SVMWithSGD) and [`SVMModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.SVMModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.classification import SVMWithSGD, SVMModel
from pyspark.mllib.regression import LabeledPoint
@@ -395,6 +401,8 @@ test, and use
to fit a logistic regression model.
Then the model is evaluated against the test dataset and saved to disk.
+Refer to the [`LogisticRegressionWithLBFGS` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS) and [`LogisticRegressionModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext
import org.apache.spark.mllib.classification.{LogisticRegressionWithLBFGS, LogisticRegressionModel}
@@ -441,6 +449,8 @@ test, and use
to fit a logistic regression model.
Then the model is evaluated against the test dataset and saved to disk.
+Refer to the [`LogisticRegressionWithLBFGS` Java docs](api/java/org/apache/spark/mllib/classification/LogisticRegressionWithLBFGS.html) and [`LogisticRegressionModel` Java docs](api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
@@ -501,6 +511,8 @@ and make predictions with the resulting model to compute the training error.
Note that the Python API does not yet support multiclass classification and model save/load but
will in the future.
+Refer to the [`LogisticRegressionWithLBFGS` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionWithLBFGS) and [`LogisticRegressionModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
from pyspark.mllib.regression import LabeledPoint
@@ -558,6 +570,8 @@ The example then uses LinearRegressionWithSGD to build a simple linear model to
values. We compute the mean squared error at the end to evaluate
[goodness of fit](http://en.wikipedia.org/wiki/Goodness_of_fit).
+Refer to the [`LinearRegressionWithSGD` Scala docs](api/scala/index.html#org.apache.spark.mllib.regression.LinearRegressionWithSGD) and [`LinearRegressionModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.regression.LinearRegressionModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.LinearRegressionModel
@@ -600,6 +614,8 @@ Spark Java API uses a separate `JavaRDD` class. You can convert a Java RDD to a
calling `.rdd()` on your `JavaRDD` object. The corresponding Java example to
the Scala snippet provided, is presented bellow:
+Refer to the [`LinearRegressionWithSGD` Java docs](api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html) and [`LinearRegressionModel` Java docs](api/java/org/apache/spark/mllib/regression/LinearRegressionModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
@@ -673,6 +689,8 @@ values. We compute the mean squared error at the end to evaluate
Note that the Python API does not yet support model save/load but will in the future.
+Refer to the [`LinearRegressionWithSGD` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.regression.LinearRegressionWithSGD) and [`LinearRegressionModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.regression.LinearRegressionModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel
diff --git a/docs/mllib-naive-bayes.md b/docs/mllib-naive-bayes.md
index e73bd30f3a..f4f6a10c82 100644
--- a/docs/mllib-naive-bayes.md
+++ b/docs/mllib-naive-bayes.md
@@ -38,6 +38,8 @@ smoothing parameter `lambda` as input, an optional model type parameter (default
[NaiveBayesModel](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayesModel), which
can be used for evaluation and prediction.
+Refer to the [`NaiveBayes` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayes) and [`NaiveBayesModel` Scala docs](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayesModel) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel}
import org.apache.spark.mllib.linalg.Vectors
@@ -73,6 +75,8 @@ optionally smoothing parameter `lambda` as input, and output a
[NaiveBayesModel](api/java/org/apache/spark/mllib/classification/NaiveBayesModel.html), which
can be used for evaluation and prediction.
+Refer to the [`NaiveBayes` Java docs](api/java/org/apache/spark/mllib/classification/NaiveBayes.html) and [`NaiveBayesModel` Java docs](api/java/org/apache/spark/mllib/classification/NaiveBayesModel.html) for details on the API.
+
{% highlight java %}
import scala.Tuple2;
@@ -118,6 +122,8 @@ used for evaluation and prediction.
Note that the Python API does not yet support model save/load but will in the future.
+Refer to the [`NaiveBayes` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.NaiveBayes) and [`NaiveBayesModel` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.classification.NaiveBayesModel) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel
from pyspark.mllib.linalg import Vectors
diff --git a/docs/mllib-optimization.md b/docs/mllib-optimization.md
index 6cabc1610a..a3bd130ba0 100644
--- a/docs/mllib-optimization.md
+++ b/docs/mllib-optimization.md
@@ -218,6 +218,8 @@ L-BFGS optimizer.
<div class="codetabs">
<div data-lang="scala" markdown="1">
+Refer to the [`LBFGS` Scala docs](api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS) and [`SquaredL2Updater` Scala docs](api/scala/index.html#org.apache.spark.mllib.optimization.SquaredL2Updater) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
@@ -278,6 +280,8 @@ println("Area under ROC = " + auROC)
</div>
<div data-lang="java" markdown="1">
+Refer to the [`LBFGS` Java docs](api/java/org/apache/spark/mllib/optimization/LBFGS.html) and [`SquaredL2Updater` Java docs](api/java/org/apache/spark/mllib/optimization/SquaredL2Updater.html) for details on the API.
+
{% highlight java %}
import java.util.Arrays;
import java.util.Random;
diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md
index 42ea2ca81f..615287125c 100644
--- a/docs/mllib-pmml-model-export.md
+++ b/docs/mllib-pmml-model-export.md
@@ -45,6 +45,8 @@ The table below outlines the MLlib models that can be exported to PMML and their
<div data-lang="scala" markdown="1">
To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.
+Refer to the [`KMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for details on the API.
+
Here a complete example of building a KMeansModel and print it out in PMML format:
{% highlight scala %}
import org.apache.spark.mllib.clustering.KMeans
diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md
index 6acfc71d7b..2c7c9ed693 100644
--- a/docs/mllib-statistics.md
+++ b/docs/mllib-statistics.md
@@ -38,6 +38,8 @@ available in `Statistics`.
which contains the column-wise max, min, mean, variance, and number of nonzeros, as well as the
total count.
+Refer to the [`MultivariateStatisticalSummary` Scala docs](api/scala/index.html#org.apache.spark.mllib.stat.MultivariateStatisticalSummary) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics}
@@ -60,6 +62,8 @@ println(summary.numNonzeros) // number of nonzeros in each column
which contains the column-wise max, min, mean, variance, and number of nonzeros, as well as the
total count.
+Refer to the [`MultivariateStatisticalSummary` Java docs](api/java/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
@@ -86,6 +90,8 @@ System.out.println(summary.numNonzeros()); // number of nonzeros in each column
which contains the column-wise max, min, mean, variance, and number of nonzeros, as well as the
total count.
+Refer to the [`MultivariateStatisticalSummary` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.stat.MultivariateStatisticalSummary) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.stat import Statistics
@@ -116,6 +122,8 @@ correlation methods are currently Pearson's and Spearman's correlation.
calculate correlations between series. Depending on the type of input, two `RDD[Double]`s or
an `RDD[Vector]`, the output will be a `Double` or the correlation `Matrix` respectively.
+Refer to the [`Statistics` Scala docs](api/scala/index.html#org.apache.spark.mllib.stat.Statistics) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext
import org.apache.spark.mllib.linalg._
@@ -144,6 +152,8 @@ val correlMatrix: Matrix = Statistics.corr(data, "pearson")
calculate correlations between series. Depending on the type of input, two `JavaDoubleRDD`s or
a `JavaRDD<Vector>`, the output will be a `Double` or the correlation `Matrix` respectively.
+Refer to the [`Statistics` Java docs](api/java/org/apache/spark/mllib/stat/Statistics.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaDoubleRDD;
import org.apache.spark.api.java.JavaSparkContext;
@@ -173,6 +183,8 @@ Matrix correlMatrix = Statistics.corr(data.rdd(), "pearson");
calculate correlations between series. Depending on the type of input, two `RDD[Double]`s or
an `RDD[Vector]`, the output will be a `Double` or the correlation `Matrix` respectively.
+Refer to the [`Statistics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.stat import Statistics
@@ -338,6 +350,8 @@ featureTestResults.foreach { result =>
run Pearson's chi-squared tests. The following example demonstrates how to run and interpret
hypothesis tests.
+Refer to the [`ChiSqTestResult` Java docs](api/java/org/apache/spark/mllib/stat/test/ChiSqTestResult.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
@@ -385,6 +399,8 @@ for (ChiSqTestResult result : featureTestResults) {
run Pearson's chi-squared tests. The following example demonstrates how to run and interpret
hypothesis tests.
+Refer to the [`Statistics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics) for more details on the API.
+
{% highlight python %}
from pyspark import SparkContext
from pyspark.mllib.linalg import Vectors, Matrices
@@ -437,6 +453,8 @@ message.
run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example demonstrates how to run
and interpret the hypothesis tests.
+Refer to the [`Statistics` Scala docs](api/scala/index.html#org.apache.spark.mllib.stat.Statistics) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.stat.Statistics
@@ -459,6 +477,8 @@ val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example demonstrates how to run
and interpret the hypothesis tests.
+Refer to the [`Statistics` Java docs](api/java/org/apache/spark/mllib/stat/Statistics.html) for details on the API.
+
{% highlight java %}
import java.util.Arrays;
@@ -483,6 +503,8 @@ System.out.println(testResult);
run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example demonstrates how to run
and interpret the hypothesis tests.
+Refer to the [`Statistics` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.stat import Statistics
@@ -513,6 +535,8 @@ methods to generate random double RDDs or vector RDDs.
The following example generates a random double RDD, whose values follows the standard normal
distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+Refer to the [`RandomRDDs` Scala docs](api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs) for details on the API.
+
{% highlight scala %}
import org.apache.spark.SparkContext
import org.apache.spark.mllib.random.RandomRDDs._
@@ -533,6 +557,8 @@ methods to generate random double RDDs or vector RDDs.
The following example generates a random double RDD, whose values follows the standard normal
distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+Refer to the [`RandomRDDs` Java docs](api/java/org/apache/spark/mllib/random/RandomRDDs) for details on the API.
+
{% highlight java %}
import org.apache.spark.SparkContext;
import org.apache.spark.api.JavaDoubleRDD;
@@ -559,6 +585,8 @@ methods to generate random double RDDs or vector RDDs.
The following example generates a random double RDD, whose values follows the standard normal
distribution `N(0, 1)`, and then map it to `N(1, 4)`.
+Refer to the [`RandomRDDs` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.random.RandomRDDs) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.random import RandomRDDs
@@ -589,6 +617,8 @@ mean of PDFs of normal distributions centered around each of the samples.
to compute kernel density estimates from an RDD of samples. The following example demonstrates how
to do so.
+Refer to the [`KernelDensity` Scala docs](api/scala/index.html#org.apache.spark.mllib.stat.KernelDensity) for details on the API.
+
{% highlight scala %}
import org.apache.spark.mllib.stat.KernelDensity
import org.apache.spark.rdd.RDD
@@ -611,6 +641,8 @@ val densities = kd.estimate(Array(-1.0, 2.0, 5.0))
to compute kernel density estimates from an RDD of samples. The following example demonstrates how
to do so.
+Refer to the [`KernelDensity` Java docs](api/java/org/apache/spark/mllib/stat/KernelDensity.html) for details on the API.
+
{% highlight java %}
import org.apache.spark.mllib.stat.KernelDensity;
import org.apache.spark.rdd.RDD;
@@ -633,6 +665,8 @@ double[] densities = kd.estimate(new double[] {-1.0, 2.0, 5.0});
to compute kernel density estimates from an RDD of samples. The following example demonstrates how
to do so.
+Refer to the [`KernelDensity` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.stat.KernelDensity) for more details on the API.
+
{% highlight python %}
from pyspark.mllib.stat import KernelDensity