[SPARK-2511][MLLIB] add HashingTF and IDF - spark

diff options

author	Xiangrui Meng <meng@databricks.com>	2014-07-31 12:55:00 -0700
committer	Xiangrui Meng <meng@databricks.com>	2014-07-31 12:55:00 -0700
commit	dc0865bc7e119fe507061c27069c17523b87dfea (patch)
tree	481dfc65f65273dda1fbfae7e22c780aee7f7168 /sql
parent	e5749a1342327263dc6b94ba470e392fbea703fa (diff)
download	spark-dc0865bc7e119fe507061c27069c17523b87dfea.tar.gz spark-dc0865bc7e119fe507061c27069c17523b87dfea.tar.bz2 spark-dc0865bc7e119fe507061c27069c17523b87dfea.zip

[SPARK-2511][MLLIB] add HashingTF and IDF

This is roughly the TF-IDF implementation used in the Databricks Cloud Demo: http://databricks.com/cloud/ . Both `HashingTF` and `IDF` are implemented as transformers, similar to scikit-learn. Author: Xiangrui Meng <meng@databricks.com> Closes #1671 from mengxr/tfidf and squashes the following commits: 7d65888 [Xiangrui Meng] use JavaConverters._ 5fe9ec4 [Xiangrui Meng] fix unit test 6e214ec [Xiangrui Meng] add apache header cfd9aed [Xiangrui Meng] add Java-friendly methods move classes to mllib.feature 3814440 [Xiangrui Meng] add HashingTF and IDF

Diffstat (limited to 'sql')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: