diff options
author | Yanbo Liang <ybliang8@gmail.com> | 2016-08-12 10:06:17 -0700 |
---|---|---|
committer | Yanbo Liang <ybliang8@gmail.com> | 2016-08-12 10:06:17 -0700 |
commit | bbae20ade14e50541e4403ca7b45bf6c11695d15 (patch) | |
tree | 41d0da76679d36b07252e040078be071a41aea23 /docs/streaming-flume-integration.md | |
parent | 79e2caa1328843457841d71642b60be919ebb1e0 (diff) | |
download | spark-bbae20ade14e50541e4403ca7b45bf6c11695d15.tar.gz spark-bbae20ade14e50541e4403ca7b45bf6c11695d15.tar.bz2 spark-bbae20ade14e50541e4403ca7b45bf6c11695d15.zip |
[SPARK-17033][ML][MLLIB] GaussianMixture should use treeAggregate to improve performance
## What changes were proposed in this pull request?
```GaussianMixture``` should use ```treeAggregate``` rather than ```aggregate``` to improve performance and scalability. In my test of dataset with 200 features and 1M instance, I found there is 20% increased performance.
BTW, we should destroy broadcast variable ```compute``` at the end of each iteration.
## How was this patch tested?
Existing tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #14621 from yanboliang/spark-17033.
Diffstat (limited to 'docs/streaming-flume-integration.md')
0 files changed, 0 insertions, 0 deletions