diff options
author | Syrux <pokcyril@hotmail.com> | 2017-04-13 09:44:33 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2017-04-13 09:44:33 +0100 |
commit | 095d1cb3aa0021c9078a6e910967b9189ddfa177 (patch) | |
tree | 50803658690156142ed3766207d7956a29d63496 /LICENSE | |
parent | ec68d8f8cfdede8a0de1d56476205158544cc4eb (diff) | |
download | spark-095d1cb3aa0021c9078a6e910967b9189ddfa177.tar.gz spark-095d1cb3aa0021c9078a6e910967b9189ddfa177.tar.bz2 spark-095d1cb3aa0021c9078a6e910967b9189ddfa177.zip |
[SPARK-20265][MLLIB] Improve Prefix'span pre-processing efficiency
## What changes were proposed in this pull request?
Improve PrefixSpan pre-processing efficency by preventing sequences of zero in the cleaned database.
The efficiency gain is reflected in the following graph : https://postimg.org/image/9x6ireuvn/
## How was this patch tested?
Using MLlib's PrefixSpan existing tests and tests of my own on the 8 datasets shown in the graph. All
result obtained were stricly the same as the original implementation (without this change).
dev/run-tests was also runned, no error were found.
Author : Cyril de Vogelaere <cyril.devogelaeregmail.com>
Author: Syrux <pokcyril@hotmail.com>
Closes #17575 from Syrux/SPARK-20265.
Diffstat (limited to 'LICENSE')
0 files changed, 0 insertions, 0 deletions