diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-07-31 12:55:00 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-07-31 12:55:00 -0700 |
commit | dc0865bc7e119fe507061c27069c17523b87dfea (patch) | |
tree | 481dfc65f65273dda1fbfae7e22c780aee7f7168 /data | |
parent | e5749a1342327263dc6b94ba470e392fbea703fa (diff) | |
download | spark-dc0865bc7e119fe507061c27069c17523b87dfea.tar.gz spark-dc0865bc7e119fe507061c27069c17523b87dfea.tar.bz2 spark-dc0865bc7e119fe507061c27069c17523b87dfea.zip |
[SPARK-2511][MLLIB] add HashingTF and IDF
This is roughly the TF-IDF implementation used in the Databricks Cloud Demo: http://databricks.com/cloud/ .
Both `HashingTF` and `IDF` are implemented as transformers, similar to scikit-learn.
Author: Xiangrui Meng <meng@databricks.com>
Closes #1671 from mengxr/tfidf and squashes the following commits:
7d65888 [Xiangrui Meng] use JavaConverters._
5fe9ec4 [Xiangrui Meng] fix unit test
6e214ec [Xiangrui Meng] add apache header
cfd9aed [Xiangrui Meng] add Java-friendly methods move classes to mllib.feature
3814440 [Xiangrui Meng] add HashingTF and IDF
Diffstat (limited to 'data')
0 files changed, 0 insertions, 0 deletions