[SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when number of features is large - spark

diff options

author	Yuhao Yang <hhbyyh@gmail.com>	2016-01-13 17:43:27 -0800
committer	Joseph K. Bradley <joseph@databricks.com>	2016-01-13 17:43:27 -0800
commit	021dafc6a05a31dc22c9f9110dedb47a1f913087 (patch)
tree	bd2f61d86f90a8b0d9147f26b104e65550f49e0c /docs/mllib-clustering.md
parent	cd81fc9e8652c07b84f0887a24d67381b4e605fa (diff)
download	spark-021dafc6a05a31dc22c9f9110dedb47a1f913087.tar.gz spark-021dafc6a05a31dc22c9f9110dedb47a1f913087.tar.bz2 spark-021dafc6a05a31dc22c9f9110dedb47a1f913087.zip

[SPARK-12026][MLLIB] ChiSqTest gets slower and slower over time when number of features is large

jira: https://issues.apache.org/jira/browse/SPARK-12026 The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger. I tested on local and the change can improve the performance and the running time was stable. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #10146 from hhbyyh/chiSq.

Diffstat (limited to 'docs/mllib-clustering.md')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: