aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-collaborative-filtering.md
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-05-18 17:00:57 -0700
committerMatei Zaharia <matei@databricks.com>2014-05-18 17:00:57 -0700
commitdf0aa8353ab6d3b19d838c6fa95a93a64948309f (patch)
tree96f19ed692c7a6578722be24c32bb0685d8d3e6b /docs/mllib-collaborative-filtering.md
parent4ce479324bdcf603806fc90b5b0f4968c6de690e (diff)
downloadspark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.tar.gz
spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.tar.bz2
spark-df0aa8353ab6d3b19d838c6fa95a93a64948309f.zip
[WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0
Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng <meng@databricks.com> Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change
Diffstat (limited to 'docs/mllib-collaborative-filtering.md')
-rw-r--r--docs/mllib-collaborative-filtering.md29
1 files changed, 17 insertions, 12 deletions
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md
index f486c56e55..d51002f015 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -1,6 +1,7 @@
---
layout: global
-title: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering
+title: Collaborative Filtering - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering
---
* Table of contents
@@ -48,7 +49,7 @@ user for an item.
<div data-lang="scala" markdown="1">
In the following example we load rating data. Each row consists of a user, a product and a rating.
-We use the default [ALS.train()](api/mllib/index.html#org.apache.spark.mllib.recommendation.ALS$)
+We use the default [ALS.train()](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS$)
method which assumes ratings are explicit. We evaluate the
recommendation model by measuring the Mean Squared Error of rating prediction.
@@ -58,9 +59,9 @@ import org.apache.spark.mllib.recommendation.Rating
// Load and parse the data
val data = sc.textFile("mllib/data/als/test.data")
-val ratings = data.map(_.split(',') match {
- case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble)
-})
+val ratings = data.map(_.split(',') match { case Array(user, item, rate) =>
+ Rating(user.toInt, item.toInt, rate.toDouble)
+ })
// Build the recommendation model using ALS
val rank = 10
@@ -68,15 +69,19 @@ val numIterations = 20
val model = ALS.train(ratings, rank, numIterations, 0.01)
// Evaluate the model on rating data
-val usersProducts = ratings.map{ case Rating(user, product, rate) => (user, product)}
-val predictions = model.predict(usersProducts).map{
- case Rating(user, product, rate) => ((user, product), rate)
+val usersProducts = ratings.map { case Rating(user, product, rate) =>
+ (user, product)
}
-val ratesAndPreds = ratings.map{
- case Rating(user, product, rate) => ((user, product), rate)
+val predictions =
+ model.predict(usersProducts).map { case Rating(user, product, rate) =>
+ ((user, product), rate)
+ }
+val ratesAndPreds = ratings.map { case Rating(user, product, rate) =>
+ ((user, product), rate)
}.join(predictions)
-val MSE = ratesAndPreds.map{
- case ((user, product), (r1, r2)) => math.pow((r1- r2), 2)
+val MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) =>
+ val err = (r1 - r2)
+ err * err
}.mean()
println("Mean Squared Error = " + MSE)
{% endhighlight %}