From df0aa8353ab6d3b19d838c6fa95a93a64948309f Mon Sep 17 00:00:00 2001
From: Xiangrui Meng <meng@databricks.com>
Date: Sun, 18 May 2014 17:00:57 -0700
Subject: [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0

Some improvements to MLlib guide:

1. [SPARK-1872] Update API links for unidoc.
2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display.
3. Add more Java/Python examples.

Author: Xiangrui Meng <meng@databricks.com>

Closes #816 from mengxr/mllib-doc and squashes the following commits:

ec2e407 [Xiangrui Meng] format scala example for ALS
cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
7dad18e [Xiangrui Meng] update java api links in NB
3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
e4afaa8 [Xiangrui Meng] explicity state what might change
---
 docs/mllib-collaborative-filtering.md | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

(limited to 'docs/mllib-collaborative-filtering.md')
diff --git a/docs/mllib-collaborative-filtering.md b/docs/mllib-collaborative-filtering.md
index f486c56e55..d51002f015 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -1,6 +1,7 @@
 ---
 layout: global
-title: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering 
+title: Collaborative Filtering - MLlib
+displayTitle: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering 
 ---
 
 * Table of contents
@@ -48,7 +49,7 @@ user for an item.
 
 <div data-lang="scala" markdown="1">
 In the following example we load rating data. Each row consists of a user, a product and a rating.
-We use the default [ALS.train()](api/mllib/index.html#org.apache.spark.mllib.recommendation.ALS$) 
+We use the default [ALS.train()](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS$) 
 method which assumes ratings are explicit. We evaluate the
 recommendation model by measuring the Mean Squared Error of rating prediction.
 
@@ -58,9 +59,9 @@ import org.apache.spark.mllib.recommendation.Rating
 
 // Load and parse the data
 val data = sc.textFile("mllib/data/als/test.data")
-val ratings = data.map(_.split(',') match {
-    case Array(user, item, rate) =>  Rating(user.toInt, item.toInt, rate.toDouble)
-})
+val ratings = data.map(_.split(',') match { case Array(user, item, rate) =>
+    Rating(user.toInt, item.toInt, rate.toDouble)
+  })
 
 // Build the recommendation model using ALS
 val rank = 10
@@ -68,15 +69,19 @@ val numIterations = 20
 val model = ALS.train(ratings, rank, numIterations, 0.01)
 
 // Evaluate the model on rating data
-val usersProducts = ratings.map{ case Rating(user, product, rate)  => (user, product)}
-val predictions = model.predict(usersProducts).map{
-    case Rating(user, product, rate) => ((user, product), rate)
+val usersProducts = ratings.map { case Rating(user, product, rate) =>
+  (user, product)
 }
-val ratesAndPreds = ratings.map{
-    case Rating(user, product, rate) => ((user, product), rate)
+val predictions = 
+  model.predict(usersProducts).map { case Rating(user, product, rate) => 
+    ((user, product), rate)
+  }
+val ratesAndPreds = ratings.map { case Rating(user, product, rate) => 
+  ((user, product), rate)
 }.join(predictions)
-val MSE = ratesAndPreds.map{
-    case ((user, product), (r1, r2)) =>  math.pow((r1- r2), 2)
+val MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) => 
+  val err = (r1 - r2)
+  err * err
 }.mean()
 println("Mean Squared Error = " + MSE)
 {% endhighlight %}
-- 
cgit v1.2.3