From a6e0afdcf0174425e8a6ff20b2bc2e3a7a374f19 Mon Sep 17 00:00:00 2001 From: Shuo Xiang Date: Thu, 12 Jun 2014 17:37:06 -0700 Subject: SPARK-2085: [MLlib] Apply user-specific regularization instead of uniform regularization in ALS The current implementation of ALS takes a single regularization parameter and apply it on both of the user factors and the product factors. This kind of regularization can be less effective while user number is significantly larger than the number of products (and vice versa). For example, if we have 10M users and 1K product, regularization on user factors will dominate. Following the discussion in [this thread](http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tt2567.html#a2704), the implementation in this PR will regularize each factor vector by #ratings * lambda. Author: Shuo Xiang Closes #1026 from coderxiang/als-reg and squashes the following commits: 93dfdb4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into als-reg b98f19c [Shuo Xiang] merge latest master 52c7b58 [Shuo Xiang] Apply user-specific regularization instead of uniform regularization in Alternating Least Squares (ALS) --- .../main/scala/org/apache/spark/mllib/recommendation/ALS.scala | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'mllib/src') diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala index f1ae7b85b4..cc56fd6ef2 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala @@ -507,6 +507,9 @@ class ALS private ( val tempXtX = DoubleMatrix.zeros(triangleSize) val fullXtX = DoubleMatrix.zeros(rank, rank) + // Count the number of ratings each user gives to provide user-specific regularization + val numRatings = Array.fill(numUsers)(0) + // Compute the XtX and Xy values for each user by adding products it rated in each product // block for (productBlock <- 0 until numProductBlocks) { @@ -519,6 +522,7 @@ class ALS private ( if (implicitPrefs) { var i = 0 while (i < us.length) { + numRatings(us(i)) += 1 // Extension to the original paper to handle rs(i) < 0. confidence is a function // of |rs(i)| instead so that it is never negative: val confidence = 1 + alpha * abs(rs(i)) @@ -534,6 +538,7 @@ class ALS private ( } else { var i = 0 while (i < us.length) { + numRatings(us(i)) += 1 userXtX(us(i)).addi(tempXtX) SimpleBlas.axpy(rs(i), x, userXy(us(i))) i += 1 @@ -550,9 +555,10 @@ class ALS private ( // Compute the full XtX matrix from the lower-triangular part we got above fillFullMatrix(userXtX(index), fullXtX) // Add regularization + val regParam = numRatings(index) * lambda var i = 0 while (i < rank) { - fullXtX.data(i * rank + i) += lambda + fullXtX.data(i * rank + i) += regParam i += 1 } // Solve the resulting matrix, which is symmetric and positive-definite -- cgit v1.2.3