[SPARK-9028] [ML] Add CountVectorizer as an estimator to generate CountVectorizerModel - spark

diff options

author	Yuhao Yang <hhbyyh@gmail.com>	2015-08-18 11:00:09 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-08-18 11:00:09 -0700
commit	354f4582b637fa25d3892ec2b12869db50ed83c9 (patch)
tree	a0e4202868d5b34b59a5789cd60d0d0ccbaa74bf /docs/mllib-classification-regression.md
parent	1968276af0f681fe51328b7dd795bd21724a5441 (diff)
download	spark-354f4582b637fa25d3892ec2b12869db50ed83c9.tar.gz spark-354f4582b637fa25d3892ec2b12869db50ed83c9.tar.bz2 spark-354f4582b637fa25d3892ec2b12869db50ed83c9.zip

[SPARK-9028] [ML] Add CountVectorizer as an estimator to generate CountVectorizerModel

jira: https://issues.apache.org/jira/browse/SPARK-9028 Add an estimator for CountVectorizerModel. The estimator will extract a vocabulary from document collections according to the term frequency. I changed the meaning of minCount as a filter across the corpus. This aligns with Word2Vec and the similar parameter in SKlearn. Author: Yuhao Yang <hhbyyh@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #7388 from hhbyyh/cvEstimator.

Diffstat (limited to 'docs/mllib-classification-regression.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: