diff options
author | Timothy Hunter <timhunter@databricks.com> | 2015-12-10 12:50:46 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-12-10 12:50:46 -0800 |
commit | 2ecbe02d5b28ee562d10c1735244b90a08532c9e (patch) | |
tree | c589a01a2900513aa1b277303ed7cdffc1961ba4 /docs/mllib-collaborative-filtering.md | |
parent | ec5f9ed5de2218938dba52152475daafd4dc4786 (diff) | |
download | spark-2ecbe02d5b28ee562d10c1735244b90a08532c9e.tar.gz spark-2ecbe02d5b28ee562d10c1735244b90a08532c9e.tar.bz2 spark-2ecbe02d5b28ee562d10c1735244b90a08532c9e.zip |
[SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).
It also removes some files that I forgot to delete with #10207
Author: Timothy Hunter <timhunter@databricks.com>
Closes #10234 from thunterdb/12212.
Diffstat (limited to 'docs/mllib-collaborative-filtering.md')
-rw-r--r-- | docs/mllib-collaborative-filtering.md | 14 |
1 files changed, 7 insertions, 7 deletions
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md index 7cd1b894e7..1ebb4654ae 100644 --- a/docs/mllib-collaborative-filtering.md +++ b/docs/mllib-collaborative-filtering.md @@ -1,7 +1,7 @@ --- layout: global -title: Collaborative Filtering - MLlib -displayTitle: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering +title: Collaborative Filtering - spark.mllib +displayTitle: Collaborative Filtering - spark.mllib --- * Table of contents @@ -11,12 +11,12 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering [Collaborative filtering](http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) is commonly used for recommender systems. These techniques aim to fill in the -missing entries of a user-item association matrix. MLlib currently supports +missing entries of a user-item association matrix. `spark.mllib` currently supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. -MLlib uses the [alternating least squares +`spark.mllib` uses the [alternating least squares (ALS)](http://dl.acm.org/citation.cfm?id=1608614) -algorithm to learn these latent factors. The implementation in MLlib has the +algorithm to learn these latent factors. The implementation in `spark.mllib` has the following parameters: * *numBlocks* is the number of blocks used to parallelize computation (set to -1 to auto-configure). @@ -34,7 +34,7 @@ The standard approach to matrix factorization based collaborative filtering trea the entries in the user-item matrix as *explicit* preferences given by the user to the item. It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views, -clicks, purchases, likes, shares etc.). The approach used in MLlib to deal with such data is taken +clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22). Essentially instead of trying to model the matrix of ratings directly, this approach treats the data @@ -119,4 +119,4 @@ a dependency. ## Tutorial The [training exercises](https://databricks-training.s3.amazonaws.com/index.html) from the Spark Summit 2014 include a hands-on tutorial for -[personalized movie recommendation with MLlib](https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html). +[personalized movie recommendation with `spark.mllib`](https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html). |