[SPARK-15502][DOC][ML][PYSPARK] add guide note that ALS only supports integer ids

This PR adds a note to clarify that the ML API for ALS only supports integers for user/item ids, and that other types for these columns can be used but the ids must fall within integer range. (Refer [SPARK-14891](https://issues.apache.org/jira/browse/SPARK-14891)). Also cleaned up a reference to `mllib` in the ML doc. ## How was this patch tested? Built and viewed User Guide doc locally. Author: Nick Pentreath <nickp@za.ibm.com> Closes #13278 from MLnick/SPARK-15502-als-int-id-doc-note.
author: Nick Pentreath <nickp@za.ibm.com> 2016-05-24 11:34:06 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2016-05-24 11:34:06 -0700
commit: 20900e5feced76e87f0a12823d0e3f07e082105f (patch)
tree: b1f571705667628a1788b1d87a829d2b6dc870a5
parent: be99a99fe7976419d727c0cc92e872aa4af58bf1 (diff)
download: spark-20900e5feced76e87f0a12823d0e3f07e082105f.tar.gz
spark-20900e5feced76e87f0a12823d0e3f07e082105f.tar.bz2
spark-20900e5feced76e87f0a12823d0e3f07e082105f.zip
1 files changed, 5 insertions, 1 deletions
diff --git a/docs/ml-collaborative-filtering.md b/docs/ml-collaborative-filtering.md
index bd3d527d9a..8bd75f3bcf 100644
--- a/docs/ml-collaborative-filtering.md
+++ b/docs/ml-collaborative-filtering.md
@@ -29,6 +29,10 @@ following parameters:
   *baseline* confidence in preference observations (defaults to 1.0).
 * *nonnegative* specifies whether or not to use nonnegative constraints for least squares (defaults to `false`).
 
+**Note:** The DataFrame-based API for ALS currently only supports integers for user and item ids.
+Other numeric types are supported for the user and item id columns, 
+but the ids must be within the integer value range. 
+
 ### Explicit vs. implicit feedback
 
 The standard approach to matrix factorization based collaborative filtering treats 
@@ -36,7 +40,7 @@ the entries in the user-item matrix as *explicit* preferences given by the user
 for example, users giving ratings to movies.
 
 It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
-clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken
+clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to deal with such data is taken
 from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
 Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
 as numbers representing the *strength* in observations of user actions (such as the number of clicks,
author	Nick Pentreath <nickp@za.ibm.com>	2016-05-24 11:34:06 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2016-05-24 11:34:06 -0700
commit	20900e5feced76e87f0a12823d0e3f07e082105f (patch)
tree	b1f571705667628a1788b1d87a829d2b6dc870a5
parent	be99a99fe7976419d727c0cc92e872aa4af58bf1 (diff)
download	spark-20900e5feced76e87f0a12823d0e3f07e082105f.tar.gz spark-20900e5feced76e87f0a12823d0e3f07e082105f.tar.bz2 spark-20900e5feced76e87f0a12823d0e3f07e082105f.zip