[SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType

to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley leahmcguire Author: Xiangrui Meng <meng@databricks.com> Closes #6277 from mengxr/SPARK-7752 and squashes the following commits: f38b662 [Xiangrui Meng] add another case _ back in test ae5c66a [Xiangrui Meng] model type -> modelType 711d1c6 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7752 40ae53e [Xiangrui Meng] fix Java test suite 264a814 [Xiangrui Meng] add case _ back 3c456a8 [Xiangrui Meng] update NB user guide 17bba53 [Xiangrui Meng] update naive Bayes to use lowercase model type strings (cherry picked from commit 13348e21b6b1c0df42c18b82b86c613291228863) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
author: Xiangrui Meng <meng@databricks.com> 2015-05-21 10:30:08 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2015-05-21 10:30:27 -0700
commit: b97a8053a02636b8f62a900d974cffa0e057441c (patch)
tree: 26fdf40a90c4a026718fdb15a665cc88b10f717d /docs
parent: 3aa618510167ef72b4107d964a490be9d90da70d (diff)
download: spark-b97a8053a02636b8f62a900d974cffa0e057441c.tar.gz
spark-b97a8053a02636b8f62a900d974cffa0e057441c.tar.bz2
spark-b97a8053a02636b8f62a900d974cffa0e057441c.zip
1 files changed, 5 insertions, 4 deletions
diff --git a/docs/mllib-naive-bayes.md b/docs/mllib-naive-bayes.md
index 9780ea52c4..56a2e9ca86 100644
--- a/docs/mllib-naive-bayes.md
+++ b/docs/mllib-naive-bayes.md
@@ -21,7 +21,7 @@ Within that context, each observation is a document and each
 feature represents a term whose value is the frequency of the term (in multinomial naive Bayes) or
 a zero or one indicating whether the term was found in the document (in Bernoulli naive Bayes).
 Feature values must be nonnegative. The model type is selected with an optional parameter
-"Multinomial" or "Bernoulli" with "Multinomial" as the default.
+"multinomial" or "bernoulli" with "multinomial" as the default.
 [Additive smoothing](http://en.wikipedia.org/wiki/Lidstone_smoothing) can be used by
 setting the parameter $\lambda$ (default to $1.0$). For document classification, the input feature
 vectors are usually sparse, and sparse vectors should be supplied as input to take advantage of
@@ -35,7 +35,7 @@ sparsity. Since the training data is only used once, it is not necessary to cach
 [NaiveBayes](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayes$) implements
 multinomial naive Bayes. It takes an RDD of
 [LabeledPoint](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint) and an optional
-smoothing parameter `lambda` as input, an optional model type parameter (default is Multinomial), and outputs a
+smoothing parameter `lambda` as input, an optional model type parameter (default is "multinomial"), and outputs a
 [NaiveBayesModel](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayesModel), which
 can be used for evaluation and prediction.
 
@@ -54,7 +54,7 @@ val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
 val training = splits(0)
 val test = splits(1)
 
-val model = NaiveBayes.train(training, lambda = 1.0, model = "Multinomial")
+val model = NaiveBayes.train(training, lambda = 1.0, model = "multinomial")
 
 val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
 val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 == x._2).count() / test.count()
@@ -75,6 +75,8 @@ optionally smoothing parameter `lambda` as input, and output a
 can be used for evaluation and prediction.
 
 {% highlight java %}
+import scala.Tuple2;
+
 import org.apache.spark.api.java.JavaPairRDD;
 import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.function.Function;
@@ -82,7 +84,6 @@ import org.apache.spark.api.java.function.PairFunction;
 import org.apache.spark.mllib.classification.NaiveBayes;
 import org.apache.spark.mllib.classification.NaiveBayesModel;
 import org.apache.spark.mllib.regression.LabeledPoint;
-import scala.Tuple2;
 
 JavaRDD<LabeledPoint> training = ... // training set
 JavaRDD<LabeledPoint> test = ... // test set
author	Xiangrui Meng <meng@databricks.com>	2015-05-21 10:30:08 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-05-21 10:30:27 -0700
commit	b97a8053a02636b8f62a900d974cffa0e057441c (patch)
tree	26fdf40a90c4a026718fdb15a665cc88b10f717d /docs
parent	3aa618510167ef72b4107d964a490be9d90da70d (diff)
download	spark-b97a8053a02636b8f62a900d974cffa0e057441c.tar.gz spark-b97a8053a02636b8f62a900d974cffa0e057441c.tar.bz2 spark-b97a8053a02636b8f62a900d974cffa0e057441c.zip