[SPARK-20011][ML][DOCS] Clarify documentation for ALS 'rank' parameter

## What changes were proposed in this pull request? API documentation and collaborative filtering documentation page changes to clarify inconsistent description of ALS rank parameter. - [DOCS] was previously: "rank is the number of latent factors in the model." - [API] was previously: "rank - number of features to use" This change describes rank in both places consistently as: - "Number of features to use (also referred to as the number of latent factors)" Author: Chris Snow <chris.snowuk.ibm.com> Author: christopher snow <chsnow123@gmail.com> Closes #17345 from snowch/SPARK-20011.
author: christopher snow <chsnow123@gmail.com> 2017-03-21 13:23:59 +0000
committer: Sean Owen <sowen@cloudera.com> 2017-03-21 13:23:59 +0000
commit: 7620aed828d8baefc425b54684a83c81f1507b02 (patch)
tree: 728c706263ff571e3691c8a3569841e35f495bde
parent: d2dcd6792f4cea39e12945ad8c4cda5d8d034de4 (diff)
download: spark-7620aed828d8baefc425b54684a83c81f1507b02.tar.gz
spark-7620aed828d8baefc425b54684a83c81f1507b02.tar.bz2
spark-7620aed828d8baefc425b54684a83c81f1507b02.zip
3 files changed, 11 insertions, 11 deletions
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md
index 0f891a09a6..d1bb6d69f1 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -20,7 +20,7 @@ algorithm to learn these latent factors. The implementation in `spark.mllib` has
 following parameters:
 
 * *numBlocks* is the number of blocks used to parallelize computation (set to -1 to auto-configure).
-* *rank* is the number of latent factors in the model.
+* *rank* is the number of features to use (also referred to as the number of latent factors).
 * *iterations* is the number of iterations of ALS to run. ALS typically converges to a reasonable
   solution in 20 iterations or less.
 * *lambda* specifies the regularization parameter in ALS.
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 76b1bc13b4..14288221b6 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -301,7 +301,7 @@ object ALS {
    * level of parallelism.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
@@ -326,7 +326,7 @@ object ALS {
    * level of parallelism.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
@@ -349,7 +349,7 @@ object ALS {
    * parallelism automatically based on the number of partitions in `ratings`.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    * @param lambda     regularization parameter
    */
@@ -366,7 +366,7 @@ object ALS {
    * parallelism automatically based on the number of partitions in `ratings`.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    */
   @Since("0.8.0")
@@ -383,7 +383,7 @@ object ALS {
    * a level of parallelism given by `blocks`.
    *
    * @param ratings    RDD of (userID, productID, rating) pairs
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
@@ -410,7 +410,7 @@ object ALS {
    * iteratively with a configurable level of parallelism.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
@@ -436,7 +436,7 @@ object ALS {
    * partitions in `ratings`.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    * @param lambda     regularization parameter
    * @param alpha      confidence parameter
@@ -455,7 +455,7 @@ object ALS {
    * partitions in `ratings`.
    *
    * @param ratings    RDD of [[Rating]] objects with userID, productID, and rating
-   * @param rank       number of features to use
+   * @param rank       number of features to use (also referred to as the number of latent factors)
    * @param iterations number of iterations of ALS
    */
   @Since("0.8.1")
diff --git a/python/pyspark/mllib/recommendation.py b/python/pyspark/mllib/recommendation.py
index 732300ee9c..8118288135 100644
--- a/python/pyspark/mllib/recommendation.py
+++ b/python/pyspark/mllib/recommendation.py
@@ -249,7 +249,7 @@ class ALS(object):
         :param ratings:
           RDD of `Rating` or (userID, productID, rating) tuple.
         :param rank:
-          Rank of the feature matrices computed (number of features).
+          Number of features to use (also referred to as the number of latent factors).
         :param iterations:
           Number of iterations of ALS.
           (default: 5)
@@ -287,7 +287,7 @@ class ALS(object):
         :param ratings:
           RDD of `Rating` or (userID, productID, rating) tuple.
         :param rank:
-          Rank of the feature matrices computed (number of features).
+          Number of features to use (also referred to as the number of latent factors).
         :param iterations:
           Number of iterations of ALS.
           (default: 5)
author	christopher snow <chsnow123@gmail.com>	2017-03-21 13:23:59 +0000
committer	Sean Owen <sowen@cloudera.com>	2017-03-21 13:23:59 +0000
commit	7620aed828d8baefc425b54684a83c81f1507b02 (patch)
tree	728c706263ff571e3691c8a3569841e35f495bde
parent	d2dcd6792f4cea39e12945ad8c4cda5d8d034de4 (diff)
download	spark-7620aed828d8baefc425b54684a83c81f1507b02.tar.gz spark-7620aed828d8baefc425b54684a83c81f1507b02.tar.bz2 spark-7620aed828d8baefc425b54684a83c81f1507b02.zip