diff options
author | Xiangrui Meng <meng@databricks.com> | 2015-05-21 18:04:45 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-05-21 18:04:45 -0700 |
commit | 85b96372cf0fd055f89fc639f45c1f2cb02a378f (patch) | |
tree | efdc362523217e9c8e3da9e4c2ba1743ad44d094 /yarn | |
parent | f5db4b416c922db7a8f1b0c098b4f08647106231 (diff) | |
download | spark-85b96372cf0fd055f89fc639f45c1f2cb02a378f.tar.gz spark-85b96372cf0fd055f89fc639f45c1f2cb02a378f.tar.bz2 spark-85b96372cf0fd055f89fc639f45c1f2cb02a378f.zip |
[SPARK-7219] [MLLIB] Output feature attributes in HashingTF
This PR updates `HashingTF` to output ML attributes that tell the number of features in the output column. We need to expand `UnaryTransformer` to support output metadata. A `df outputMetadata: Metadata` is not sufficient because the metadata may also depends on the input data. Though this is not true for `HashingTF`, I think it is reasonable to update `UnaryTransformer` in a separate PR. `checkParams` is added to verify common requirements for params. I will send a separate PR to use it in other test suites. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #6308 from mengxr/SPARK-7219 and squashes the following commits:
9bd2922 [Xiangrui Meng] address comments
e82a68a [Xiangrui Meng] remove sqlContext from test suite
995535b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7219
2194703 [Xiangrui Meng] add test for attributes
178ae23 [Xiangrui Meng] update HashingTF with tests
91a6106 [Xiangrui Meng] WIP
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions