diff options
author | Yuhao Yang <hhbyyh@gmail.com> | 2015-12-05 15:27:31 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2015-12-05 15:27:31 +0000 |
commit | ee94b70ce56661ea26c5aad17778ade32f3f1d3d (patch) | |
tree | 95f1d75df182253e4e418a8e598d1ff277b0fc59 /mllib/pom.xml | |
parent | 3af53e61fd604fe8000e1fdf656d60b79c842d1c (diff) | |
download | spark-ee94b70ce56661ea26c5aad17778ade32f3f1d3d.tar.gz spark-ee94b70ce56661ea26c5aad17778ade32f3f1d3d.tar.bz2 spark-ee94b70ce56661ea26c5aad17778ade32f3f1d3d.zip |
[SPARK-12096][MLLIB] remove the old constraint in word2vec
jira: https://issues.apache.org/jira/browse/SPARK-12096
word2vec now can handle much bigger vocabulary.
The old constraint vocabSize.toLong * vectorSize < Ine.max / 8 should be removed.
new constraint is vocabSize.toLong * vectorSize < max array length (usually a little less than Int.MaxValue)
I tested with vocabsize over 18M and vectorsize = 100.
srowen jkbradley Sorry to miss this in last PR. I was reminded today.
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #10103 from hhbyyh/w2vCapacity.
Diffstat (limited to 'mllib/pom.xml')
0 files changed, 0 insertions, 0 deletions