aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-guide.md
diff options
context:
space:
mode:
authorNick Pentreath <nick.pentreath@gmail.com>2013-10-04 14:39:44 +0200
committerNick Pentreath <nick.pentreath@gmail.com>2013-10-04 14:39:44 +0200
commit93b96b44d778716a4e76bdcf68d6a07694a06460 (patch)
tree7b3820ba1c2507d46a1c94658cef9f94179c630f /docs/mllib-guide.md
parentc6ceaeae50c7b5d69bb9897af32f537bfbde152d (diff)
downloadspark-93b96b44d778716a4e76bdcf68d6a07694a06460.tar.gz
spark-93b96b44d778716a4e76bdcf68d6a07694a06460.tar.bz2
spark-93b96b44d778716a4e76bdcf68d6a07694a06460.zip
Adding implicit feedback ALS to MLlib user guide
Diffstat (limited to 'docs/mllib-guide.md')
-rw-r--r--docs/mllib-guide.md24
1 files changed, 20 insertions, 4 deletions
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index f991d86c8d..c1ff9c417c 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -144,10 +144,9 @@ Available algorithms for clustering:
# Collaborative Filtering
-[Collaborative
-filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
+[Collaborative filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering)
is commonly used for recommender systems. These techniques aim to fill in the
-missing entries of a user-product association matrix. MLlib currently supports
+missing entries of a user-item association matrix. MLlib currently supports
model-based collaborative filtering, in which users and products are described
by a small set of latent factors that can be used to predict missing entries.
In particular, we implement the [alternating least squares
@@ -158,7 +157,24 @@ following parameters:
* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
* *rank* is the number of latent factors in our model.
* *iterations* is the number of iterations to run.
-* *lambda* specifies the regularization parameter in ALS.
+* *lambda* specifies the regularization parameter in ALS.
+* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for *implicit feedback* data
+* *alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the *baseline* confidence in preference observations
+
+## Explicit vs Implicit Feedback
+
+The standard approach to matrix factorization based collaborative filtering treats
+the entries in the user-item matrix as *explicit* preferences given by the user to the item.
+
+It is common in many real-world use cases to only have access to *implicit feedback*
+(e.g. views, clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with
+such data is taken from
+[Collaborative Filtering for Implicit Feedback Datasets](http://research.yahoo.com/pub/2433).
+Essentially instead of trying to model the matrix of ratings directly, this approach treats the data as
+a combination of binary preferences and *confidence values*. The ratings are then related
+to the level of confidence in observed user preferences, rather than explicit ratings given to items.
+The model then tries to find latent factors that can be used to predict the expected preference of a user
+for an item.
Available algorithms for collaborative filtering: