aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-collaborative-filtering.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/ml-collaborative-filtering.md')
-rw-r--r--docs/ml-collaborative-filtering.md6
1 files changed, 5 insertions, 1 deletions
diff --git a/docs/ml-collaborative-filtering.md b/docs/ml-collaborative-filtering.md
index bd3d527d9a..8bd75f3bcf 100644
--- a/docs/ml-collaborative-filtering.md
+++ b/docs/ml-collaborative-filtering.md
@@ -29,6 +29,10 @@ following parameters:
*baseline* confidence in preference observations (defaults to 1.0).
* *nonnegative* specifies whether or not to use nonnegative constraints for least squares (defaults to `false`).
+**Note:** The DataFrame-based API for ALS currently only supports integers for user and item ids.
+Other numeric types are supported for the user and item id columns,
+but the ids must be within the integer value range.
+
### Explicit vs. implicit feedback
The standard approach to matrix factorization based collaborative filtering treats
@@ -36,7 +40,7 @@ the entries in the user-item matrix as *explicit* preferences given by the user
for example, users giving ratings to movies.
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
-clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to deal with such data is taken
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
as numbers representing the *strength* in observations of user actions (such as the number of clicks,