[SPARK-17033][ML][MLLIB] GaussianMixture should use treeAggregate to improve performance - spark

diff options

author	Yanbo Liang <ybliang8@gmail.com>	2016-08-12 10:06:17 -0700
committer	Yanbo Liang <ybliang8@gmail.com>	2016-08-12 10:06:17 -0700
commit	bbae20ade14e50541e4403ca7b45bf6c11695d15 (patch)
tree	41d0da76679d36b07252e040078be071a41aea23 /sql/core/src/test
parent	79e2caa1328843457841d71642b60be919ebb1e0 (diff)
download	spark-bbae20ade14e50541e4403ca7b45bf6c11695d15.tar.gz spark-bbae20ade14e50541e4403ca7b45bf6c11695d15.tar.bz2 spark-bbae20ade14e50541e4403ca7b45bf6c11695d15.zip

[SPARK-17033][ML][MLLIB] GaussianMixture should use treeAggregate to improve performance

## What changes were proposed in this pull request? ```GaussianMixture``` should use ```treeAggregate``` rather than ```aggregate``` to improve performance and scalability. In my test of dataset with 200 features and 1M instance, I found there is 20% increased performance. BTW, we should destroy broadcast variable ```compute``` at the end of each iteration. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14621 from yanboliang/spark-17033.

Diffstat (limited to 'sql/core/src/test')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: