From 93b96b44d778716a4e76bdcf68d6a07694a06460 Mon Sep 17 00:00:00 2001 From: Nick Pentreath Date: Fri, 4 Oct 2013 14:39:44 +0200 Subject: Adding implicit feedback ALS to MLlib user guide --- docs/mllib-guide.md | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) (limited to 'docs/mllib-guide.md') diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index f991d86c8d..c1ff9c417c 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -144,10 +144,9 @@ Available algorithms for clustering: # Collaborative Filtering -[Collaborative -filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) +[Collaborative filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) is commonly used for recommender systems. These techniques aim to fill in the -missing entries of a user-product association matrix. MLlib currently supports +missing entries of a user-item association matrix. MLlib currently supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. In particular, we implement the [alternating least squares @@ -158,7 +157,24 @@ following parameters: * *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure). * *rank* is the number of latent factors in our model. * *iterations* is the number of iterations to run. -* *lambda* specifies the regularization parameter in ALS. +* *lambda* specifies the regularization parameter in ALS. +* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for *implicit feedback* data +* *alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the *baseline* confidence in preference observations + +## Explicit vs Implicit Feedback + +The standard approach to matrix factorization based collaborative filtering treats +the entries in the user-item matrix as *explicit* preferences given by the user to the item. + +It is common in many real-world use cases to only have access to *implicit feedback* +(e.g. views, clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with +such data is taken from +[Collaborative Filtering for Implicit Feedback Datasets](http://research.yahoo.com/pub/2433). +Essentially instead of trying to model the matrix of ratings directly, this approach treats the data as +a combination of binary preferences and *confidence values*. The ratings are then related +to the level of confidence in observed user preferences, rather than explicit ratings given to items. +The model then tries to find latent factors that can be used to predict the expected preference of a user +for an item. Available algorithms for collaborative filtering: -- cgit v1.2.3