aboutsummaryrefslogtreecommitdiff
path: root/LICENSE
diff options
context:
space:
mode:
authorYun Ni <yunn@uber.com>2016-11-28 15:14:46 -0800
committerJoseph K. Bradley <joseph@databricks.com>2016-11-28 15:14:46 -0800
commit05f7c6ffab2a6be548375cd624dc27092677232f (patch)
tree27a954222f507a44273df13222d0946a7b485eed /LICENSE
parent8b1609bebe489b2ef78db4be6e9836687089fe3d (diff)
downloadspark-05f7c6ffab2a6be548375cd624dc27092677232f.tar.gz
spark-05f7c6ffab2a6be548375cd624dc27092677232f.tar.bz2
spark-05f7c6ffab2a6be548375cd624dc27092677232f.zip
[SPARK-18408][ML] API Improvements for LSH
## What changes were proposed in this pull request? (1) Change output schema to `Array of Vector` instead of `Vectors` (2) Use `numHashTables` as the dimension of Array (3) Rename `RandomProjection` to `BucketedRandomProjectionLSH`, `MinHash` to `MinHashLSH` (4) Make `randUnitVectors/randCoefficients` private (5) Make Multi-Probe NN Search and `hashDistance` private for future discussion Saved for future PRs: (1) AND-amplification and `numHashFunctions` as the dimension of Vector are saved for a future PR. (2) `hashDistance` and MultiProbe NN Search needs more discussion. The current implementation is just a backward compatible one. ## How was this patch tested? Related unit tests are modified to make sure the performance of LSH are ensured, and the outputs of the APIs meets expectation. Author: Yun Ni <yunn@uber.com> Author: Yunni <Euler57721@gmail.com> Closes #15874 from Yunni/SPARK-18408-yunn-api-improvements.
Diffstat (limited to 'LICENSE')
0 files changed, 0 insertions, 0 deletions