aboutsummaryrefslogtreecommitdiff
path: root/sql/core
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-12-30 10:40:17 +0000
committerSean Owen <sowen@cloudera.com>2016-12-30 10:40:17 +0000
commit56d3a7eb83f9c91d06dab2c91e10569723eeb105 (patch)
tree6f6ff01f72b0feb70c654fb52289ddfe06effc03 /sql/core
parent63036aee2271cdbb7032b51b2ac67edbcb82389e (diff)
downloadspark-56d3a7eb83f9c91d06dab2c91e10569723eeb105.tar.gz
spark-56d3a7eb83f9c91d06dab2c91e10569723eeb105.tar.bz2
spark-56d3a7eb83f9c91d06dab2c91e10569723eeb105.zip
[SPARK-18808][ML][MLLIB] ml.KMeansModel.transform is very inefficient
## What changes were proposed in this pull request? mllib.KMeansModel.clusterCentersWithNorm is a method than ends up being called every time `predict` is called on a single vector, which is bad news for now the ml.KMeansModel Transformer works, which necessarily transforms one vector at a time. This causes the model to just store the vectors with norms upfront. The extra norm should be small compared to the vectors. This would avoid this form of overhead on this and other code paths. ## How was this patch tested? Existing tests. Author: Sean Owen <sowen@cloudera.com> Closes #16328 from srowen/SPARK-18808.
Diffstat (limited to 'sql/core')
0 files changed, 0 insertions, 0 deletions