From df0aa8353ab6d3b19d838c6fa95a93a64948309f Mon Sep 17 00:00:00 2001 From: Xiangrui Meng Date: Sun, 18 May 2014 17:00:57 -0700 Subject: [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0 Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng Closes #816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change --- docs/mllib-collaborative-filtering.md | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) (limited to 'docs/mllib-collaborative-filtering.md') diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md index f486c56e55..d51002f015 100644 --- a/docs/mllib-collaborative-filtering.md +++ b/docs/mllib-collaborative-filtering.md @@ -1,6 +1,7 @@ --- layout: global -title: MLlib - Collaborative Filtering +title: Collaborative Filtering - MLlib +displayTitle: MLlib - Collaborative Filtering --- * Table of contents @@ -48,7 +49,7 @@ user for an item.
In the following example we load rating data. Each row consists of a user, a product and a rating. -We use the default [ALS.train()](api/mllib/index.html#org.apache.spark.mllib.recommendation.ALS$) +We use the default [ALS.train()](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS$) method which assumes ratings are explicit. We evaluate the recommendation model by measuring the Mean Squared Error of rating prediction. @@ -58,9 +59,9 @@ import org.apache.spark.mllib.recommendation.Rating // Load and parse the data val data = sc.textFile("mllib/data/als/test.data") -val ratings = data.map(_.split(',') match { - case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) -}) +val ratings = data.map(_.split(',') match { case Array(user, item, rate) => + Rating(user.toInt, item.toInt, rate.toDouble) + }) // Build the recommendation model using ALS val rank = 10 @@ -68,15 +69,19 @@ val numIterations = 20 val model = ALS.train(ratings, rank, numIterations, 0.01) // Evaluate the model on rating data -val usersProducts = ratings.map{ case Rating(user, product, rate) => (user, product)} -val predictions = model.predict(usersProducts).map{ - case Rating(user, product, rate) => ((user, product), rate) +val usersProducts = ratings.map { case Rating(user, product, rate) => + (user, product) } -val ratesAndPreds = ratings.map{ - case Rating(user, product, rate) => ((user, product), rate) +val predictions = + model.predict(usersProducts).map { case Rating(user, product, rate) => + ((user, product), rate) + } +val ratesAndPreds = ratings.map { case Rating(user, product, rate) => + ((user, product), rate) }.join(predictions) -val MSE = ratesAndPreds.map{ - case ((user, product), (r1, r2)) => math.pow((r1- r2), 2) +val MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) => + val err = (r1 - r2) + err * err }.mean() println("Mean Squared Error = " + MSE) {% endhighlight %} -- cgit v1.2.3