[SPARK-15449][MLLIB][EXAMPLE] Wrong Data Format - Documentation Issue

## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) In the MLLib naivebayes example, scala and python example doesn't use libsvm data, but Java does. I make changes in scala and python example to use the libsvm data as the same as Java example. ## How was this patch tested? Manual tests Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13301 from wangmiao1981/example.
author: wm624@hotmail.com <wm624@hotmail.com> 2016-05-27 20:59:24 -0500
committer: Sean Owen <sowen@cloudera.com> 2016-05-27 20:59:24 -0500
commit: 5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911 (patch)
tree: 57f130594c229600e6f392c8f1b76012a5bd5ddd /examples/src/main/scala
parent: 4a2fb8b87ca4517e0f4a1d7a1a1b3c08c1c1294d (diff)
download: spark-5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911.tar.gz
spark-5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911.tar.bz2
spark-5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911.zip
1 files changed, 4 insertions, 10 deletions
diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/NaiveBayesExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/NaiveBayesExample.scala
index 0187ad603a..b321d8e127 100644
--- a/examples/src/main/scala/org/apache/spark/examples/mllib/NaiveBayesExample.scala
+++ b/examples/src/main/scala/org/apache/spark/examples/mllib/NaiveBayesExample.scala
@@ -21,8 +21,7 @@ package org.apache.spark.examples.mllib
 import org.apache.spark.{SparkConf, SparkContext}
 // $example on$
 import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel}
-import org.apache.spark.mllib.linalg.Vectors
-import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
 // $example off$
 
 object NaiveBayesExample {
@@ -31,16 +30,11 @@ object NaiveBayesExample {
     val conf = new SparkConf().setAppName("NaiveBayesExample")
     val sc = new SparkContext(conf)
     // $example on$
-    val data = sc.textFile("data/mllib/sample_naive_bayes_data.txt")
-    val parsedData = data.map { line =>
-      val parts = line.split(',')
-      LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
-    }
+    // Load and parse the data file.
+    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
 
     // Split data into training (60%) and test (40%).
-    val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
-    val training = splits(0)
-    val test = splits(1)
+    val Array(training, test) = data.randomSplit(Array(0.6, 0.4))
 
     val model = NaiveBayes.train(training, lambda = 1.0, modelType = "multinomial")
author	wm624@hotmail.com <wm624@hotmail.com>	2016-05-27 20:59:24 -0500
committer	Sean Owen <sowen@cloudera.com>	2016-05-27 20:59:24 -0500
commit	5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911 (patch)
tree	57f130594c229600e6f392c8f1b76012a5bd5ddd /examples/src/main/scala
parent	4a2fb8b87ca4517e0f4a1d7a1a1b3c08c1c1294d (diff)
download	spark-5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911.tar.gz spark-5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911.tar.bz2 spark-5d4dafe8fdea49dcbd6b0e4c23e3791fa30c8911.zip