diff options
author | Xiangrui Meng <meng@databricks.com> | 2015-04-28 09:59:36 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-04-28 09:59:36 -0700 |
commit | b14cd2364932e504695bcc49486ffb4518fdf33d (patch) | |
tree | b2ddae86f122b2feba34f46f41bddc7e8cbc66d0 /conf/spark-env.sh.template | |
parent | 6a827d5d1ec520f129e42c3818fe7d0d870dcbef (diff) | |
download | spark-b14cd2364932e504695bcc49486ffb4518fdf33d.tar.gz spark-b14cd2364932e504695bcc49486ffb4518fdf33d.tar.bz2 spark-b14cd2364932e504695bcc49486ffb4518fdf33d.zip |
[SPARK-7140] [MLLIB] only scan the first 16 entries in Vector.hashCode
The Python SerDe calls `Object.hashCode`, which is very expensive for Vectors. It is not necessary to scan the whole vector, especially for large ones. In this PR, we only scan the first 16 nonzeros. srowen
Author: Xiangrui Meng <meng@databricks.com>
Closes #5697 from mengxr/SPARK-7140 and squashes the following commits:
2abc86d [Xiangrui Meng] typo
8fb7d74 [Xiangrui Meng] update impl
1ebad60 [Xiangrui Meng] only scan the first 16 nonzeros in Vector.hashCode
Diffstat (limited to 'conf/spark-env.sh.template')
0 files changed, 0 insertions, 0 deletions