diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-08-19 21:01:23 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-08-19 21:01:23 -0700 |
commit | 068b6fe6a10eb1c6b2102d88832203267f030e85 (patch) | |
tree | eb12c866970102b636d0edb80351bee0b6cb7b28 /docs/mllib-naive-bayes.md | |
parent | 0e3ab94d413fd70fff748fded42ab5e2ebd66fcc (diff) | |
download | spark-068b6fe6a10eb1c6b2102d88832203267f030e85.tar.gz spark-068b6fe6a10eb1c6b2102d88832203267f030e85.tar.bz2 spark-068b6fe6a10eb1c6b2102d88832203267f030e85.zip |
[SPARK-3130][MLLIB] detect negative values in naive Bayes
because NB treats feature values as term frequencies. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #2038 from mengxr/nb-neg and squashes the following commits:
52c37c3 [Xiangrui Meng] address comments
65f892d [Xiangrui Meng] detect negative values in nb
Diffstat (limited to 'docs/mllib-naive-bayes.md')
-rw-r--r-- | docs/mllib-naive-bayes.md | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/docs/mllib-naive-bayes.md b/docs/mllib-naive-bayes.md index 86d94aebd9..7f9d4c6563 100644 --- a/docs/mllib-naive-bayes.md +++ b/docs/mllib-naive-bayes.md @@ -17,7 +17,8 @@ Bayes](http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Multinomial_naive_Bay which is typically used for [document classification](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html). Within that context, each observation is a document and each -feature represents a term whose value is the frequency of the term. +feature represents a term whose value is the frequency of the term. +Feature values must be nonnegative to represent term frequencies. [Additive smoothing](http://en.wikipedia.org/wiki/Lidstone_smoothing) can be used by setting the parameter $\lambda$ (default to $1.0$). For document classification, the input feature vectors are usually sparse, and sparse vectors should be supplied as input to take advantage of |