diff options
author | William Benton <willb@redhat.com> | 2016-09-17 12:49:58 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-09-17 12:49:58 +0100 |
commit | 25cbbe6ca334140204e7035ab8b9d304da9b8a8a (patch) | |
tree | 7e0ec70179b52f4b39336c2fbb841a8584e83a48 /.gitattributes | |
parent | f15d41be3ce7569736ccbf2ffe1bec265865f55d (diff) | |
download | spark-25cbbe6ca334140204e7035ab8b9d304da9b8a8a.tar.gz spark-25cbbe6ca334140204e7035ab8b9d304da9b8a8a.tar.bz2 spark-25cbbe6ca334140204e7035ab8b9d304da9b8a8a.zip |
[SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector
## What changes were proposed in this pull request?
This pull request changes the behavior of `Word2VecModel.findSynonyms` so that it will not spuriously reject the best match when invoked with a vector that does not correspond to a word in the model's vocabulary. Instead of blindly discarding the best match, the changed implementation discards a match that corresponds to the query word (in cases where `findSynonyms` is invoked with a word) or that has an identical angle to the query vector.
## How was this patch tested?
I added a test to `Word2VecSuite` to ensure that the word with the most similar vector from a supplied vector would not be spuriously rejected.
Author: William Benton <willb@redhat.com>
Closes #15105 from willb/fix/findSynonyms.
Diffstat (limited to '.gitattributes')
0 files changed, 0 insertions, 0 deletions