[SPARK-18408][ML] API Improvements for LSH - spark

diff options

author	Yun Ni <yunn@uber.com>	2016-11-28 15:14:46 -0800
committer	Joseph K. Bradley <joseph@databricks.com>	2016-11-28 15:14:46 -0800
commit	05f7c6ffab2a6be548375cd624dc27092677232f (patch)
tree	27a954222f507a44273df13222d0946a7b485eed /dev/run-pip-tests
parent	8b1609bebe489b2ef78db4be6e9836687089fe3d (diff)
download	spark-05f7c6ffab2a6be548375cd624dc27092677232f.tar.gz spark-05f7c6ffab2a6be548375cd624dc27092677232f.tar.bz2 spark-05f7c6ffab2a6be548375cd624dc27092677232f.zip

[SPARK-18408][ML] API Improvements for LSH

## What changes were proposed in this pull request? (1) Change output schema to `Array of Vector` instead of `Vectors` (2) Use `numHashTables` as the dimension of Array (3) Rename `RandomProjection` to `BucketedRandomProjectionLSH`, `MinHash` to `MinHashLSH` (4) Make `randUnitVectors/randCoefficients` private (5) Make Multi-Probe NN Search and `hashDistance` private for future discussion Saved for future PRs: (1) AND-amplification and `numHashFunctions` as the dimension of Vector are saved for a future PR. (2) `hashDistance` and MultiProbe NN Search needs more discussion. The current implementation is just a backward compatible one. ## How was this patch tested? Related unit tests are modified to make sure the performance of LSH are ensured, and the outputs of the APIs meets expectation. Author: Yun Ni <yunn@uber.com> Author: Yunni <Euler57721@gmail.com> Closes #15874 from Yunni/SPARK-18408-yunn-api-improvements.

Diffstat (limited to 'dev/run-pip-tests')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: