aboutsummaryrefslogtreecommitdiff
path: root/streaming
diff options
context:
space:
mode:
authorYuhao Yang <hhbyyh@gmail.com>2015-07-09 10:26:38 -0700
committerJoseph K. Bradley <joseph@databricks.com>2015-07-09 10:26:38 -0700
commit0cd84c86cac68600a74d84e50ad40c0c8b84822a (patch)
tree5c74ebeb5fa6999d14a51ac51a60783f6fb25fca /streaming
parentc59e268d17cf10e46dbdbe760e2a7580a6364692 (diff)
downloadspark-0cd84c86cac68600a74d84e50ad40c0c8b84822a.tar.gz
spark-0cd84c86cac68600a74d84e50ad40c0c8b84822a.tar.bz2
spark-0cd84c86cac68600a74d84e50ad40c0c8b84822a.zip
[SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector
jira: https://issues.apache.org/jira/browse/SPARK-8703 Converts a text document to a sparse vector of token counts. I can further add an estimator to extract vocabulary from corpus if that's appropriate. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7084 from hhbyyh/countVectorization and squashes the following commits: 5f3f655 [Yuhao Yang] text change 24728e4 [Yuhao Yang] style improvement 576728a [Yuhao Yang] rename to model and some fix 1deca28 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization 99b0c14 [Yuhao Yang] undo extension from HashingTF 12c2dc8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization 7ee1c31 [Yuhao Yang] extends HashingTF 809fb59 [Yuhao Yang] minor fix for ut 7c61fb3 [Yuhao Yang] add countVectorizer
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions