aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-naive-bayes.md
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-08-19 21:01:23 -0700
committerXiangrui Meng <meng@databricks.com>2014-08-19 21:01:23 -0700
commit068b6fe6a10eb1c6b2102d88832203267f030e85 (patch)
treeeb12c866970102b636d0edb80351bee0b6cb7b28 /docs/mllib-naive-bayes.md
parent0e3ab94d413fd70fff748fded42ab5e2ebd66fcc (diff)
downloadspark-068b6fe6a10eb1c6b2102d88832203267f030e85.tar.gz
spark-068b6fe6a10eb1c6b2102d88832203267f030e85.tar.bz2
spark-068b6fe6a10eb1c6b2102d88832203267f030e85.zip
[SPARK-3130][MLLIB] detect negative values in naive Bayes
because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #2038 from mengxr/nb-neg and squashes the following commits: 52c37c3 [Xiangrui Meng] address comments 65f892d [Xiangrui Meng] detect negative values in nb
Diffstat (limited to 'docs/mllib-naive-bayes.md')
-rw-r--r--docs/mllib-naive-bayes.md3
1 files changed, 2 insertions, 1 deletions
diff --git a/docs/mllib-naive-bayes.md b/docs/mllib-naive-bayes.md
index 86d94aebd9..7f9d4c6563 100644
--- a/docs/mllib-naive-bayes.md
+++ b/docs/mllib-naive-bayes.md
@@ -17,7 +17,8 @@ Bayes](http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Multinomial_naive_Bay
which is typically used for [document
classification](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html).
Within that context, each observation is a document and each
-feature represents a term whose value is the frequency of the term.
+feature represents a term whose value is the frequency of the term.
+Feature values must be nonnegative to represent term frequencies.
[Additive smoothing](http://en.wikipedia.org/wiki/Lidstone_smoothing) can be used by
setting the parameter $\lambda$ (default to $1.0$). For document classification, the input feature
vectors are usually sparse, and sparse vectors should be supplied as input to take advantage of