aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorchristopher snow <chsnow123@gmail.com>2017-03-21 13:23:59 +0000
committerSean Owen <sowen@cloudera.com>2017-03-21 13:23:59 +0000
commit7620aed828d8baefc425b54684a83c81f1507b02 (patch)
tree728c706263ff571e3691c8a3569841e35f495bde
parentd2dcd6792f4cea39e12945ad8c4cda5d8d034de4 (diff)
downloadspark-7620aed828d8baefc425b54684a83c81f1507b02.tar.gz
spark-7620aed828d8baefc425b54684a83c81f1507b02.tar.bz2
spark-7620aed828d8baefc425b54684a83c81f1507b02.zip
[SPARK-20011][ML][DOCS] Clarify documentation for ALS 'rank' parameter
## What changes were proposed in this pull request? API documentation and collaborative filtering documentation page changes to clarify inconsistent description of ALS rank parameter. - [DOCS] was previously: "rank is the number of latent factors in the model." - [API] was previously: "rank - number of features to use" This change describes rank in both places consistently as: - "Number of features to use (also referred to as the number of latent factors)" Author: Chris Snow <chris.snowuk.ibm.com> Author: christopher snow <chsnow123@gmail.com> Closes #17345 from snowch/SPARK-20011.
-rw-r--r--docs/mllib-collaborative-filtering.md2
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala16
-rw-r--r--python/pyspark/mllib/recommendation.py4
3 files changed, 11 insertions, 11 deletions
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md
index 0f891a09a6..d1bb6d69f1 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -20,7 +20,7 @@ algorithm to learn these latent factors. The implementation in `spark.mllib` has
following parameters:
* *numBlocks* is the number of blocks used to parallelize computation (set to -1 to auto-configure).
-* *rank* is the number of latent factors in the model.
+* *rank* is the number of features to use (also referred to as the number of latent factors).
* *iterations* is the number of iterations of ALS to run. ALS typically converges to a reasonable
solution in 20 iterations or less.
* *lambda* specifies the regularization parameter in ALS.
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 76b1bc13b4..14288221b6 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -301,7 +301,7 @@ object ALS {
* level of parallelism.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
* @param lambda regularization parameter
* @param blocks level of parallelism to split computation into
@@ -326,7 +326,7 @@ object ALS {
* level of parallelism.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
* @param lambda regularization parameter
* @param blocks level of parallelism to split computation into
@@ -349,7 +349,7 @@ object ALS {
* parallelism automatically based on the number of partitions in `ratings`.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
* @param lambda regularization parameter
*/
@@ -366,7 +366,7 @@ object ALS {
* parallelism automatically based on the number of partitions in `ratings`.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
*/
@Since("0.8.0")
@@ -383,7 +383,7 @@ object ALS {
* a level of parallelism given by `blocks`.
*
* @param ratings RDD of (userID, productID, rating) pairs
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
* @param lambda regularization parameter
* @param blocks level of parallelism to split computation into
@@ -410,7 +410,7 @@ object ALS {
* iteratively with a configurable level of parallelism.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
* @param lambda regularization parameter
* @param blocks level of parallelism to split computation into
@@ -436,7 +436,7 @@ object ALS {
* partitions in `ratings`.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
* @param lambda regularization parameter
* @param alpha confidence parameter
@@ -455,7 +455,7 @@ object ALS {
* partitions in `ratings`.
*
* @param ratings RDD of [[Rating]] objects with userID, productID, and rating
- * @param rank number of features to use
+ * @param rank number of features to use (also referred to as the number of latent factors)
* @param iterations number of iterations of ALS
*/
@Since("0.8.1")
diff --git a/python/pyspark/mllib/recommendation.py b/python/pyspark/mllib/recommendation.py
index 732300ee9c..8118288135 100644
--- a/python/pyspark/mllib/recommendation.py
+++ b/python/pyspark/mllib/recommendation.py
@@ -249,7 +249,7 @@ class ALS(object):
:param ratings:
RDD of `Rating` or (userID, productID, rating) tuple.
:param rank:
- Rank of the feature matrices computed (number of features).
+ Number of features to use (also referred to as the number of latent factors).
:param iterations:
Number of iterations of ALS.
(default: 5)
@@ -287,7 +287,7 @@ class ALS(object):
:param ratings:
RDD of `Rating` or (userID, productID, rating) tuple.
:param rank:
- Rank of the feature matrices computed (number of features).
+ Number of features to use (also referred to as the number of latent factors).
:param iterations:
Number of iterations of ALS.
(default: 5)