aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-classification-regression.md
diff options
context:
space:
mode:
authorYuhao Yang <hhbyyh@gmail.com>2015-08-18 11:00:09 -0700
committerJoseph K. Bradley <joseph@databricks.com>2015-08-18 11:00:09 -0700
commit354f4582b637fa25d3892ec2b12869db50ed83c9 (patch)
treea0e4202868d5b34b59a5789cd60d0d0ccbaa74bf /docs/mllib-classification-regression.md
parent1968276af0f681fe51328b7dd795bd21724a5441 (diff)
downloadspark-354f4582b637fa25d3892ec2b12869db50ed83c9.tar.gz
spark-354f4582b637fa25d3892ec2b12869db50ed83c9.tar.bz2
spark-354f4582b637fa25d3892ec2b12869db50ed83c9.zip
[SPARK-9028] [ML] Add CountVectorizer as an estimator to generate CountVectorizerModel
jira: https://issues.apache.org/jira/browse/SPARK-9028 Add an estimator for CountVectorizerModel. The estimator will extract a vocabulary from document collections according to the term frequency. I changed the meaning of minCount as a filter across the corpus. This aligns with Word2Vec and the similar parameter in SKlearn. Author: Yuhao Yang <hhbyyh@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #7388 from hhbyyh/cvEstimator.
Diffstat (limited to 'docs/mllib-classification-regression.md')
0 files changed, 0 insertions, 0 deletions