diff options
author | Jacky Li <jacky.likun@huawei.com> | 2015-02-01 20:07:25 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-02-01 20:07:25 -0800 |
commit | 859f7249a614c86fc1691cc3116463f85f33f153 (patch) | |
tree | 7f16495e4023248f5620b5454f582070b4bdf68f /.gitignore | |
parent | d85cd4eb1479f8d37dab360530dc2c71216b4a8d (diff) | |
download | spark-859f7249a614c86fc1691cc3116463f85f33f153.tar.gz spark-859f7249a614c86fc1691cc3116463f85f33f153.tar.bz2 spark-859f7249a614c86fc1691cc3116463f85f33f153.zip |
[SPARK-4001][MLlib] adding parallel FP-Growth algorithm for frequent pattern mining in MLlib
Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark. This PR add an implementation for it.
There is a point I am not sure wether it is most efficient. In order to filter out the eligible frequent item set, currently I am using a cartesian operation on two RDDs to calculate the degree of support of each item set, not sure wether it is better to use broadcast variable to achieve the same.
I will add an example to use this algorithm if requires
Author: Jacky Li <jacky.likun@huawei.com>
Author: Jacky Li <jackylk@users.noreply.github.com>
Author: Xiangrui Meng <meng@databricks.com>
Closes #2847 from jackylk/apriori and squashes the following commits:
bee3093 [Jacky Li] Merge pull request #1 from mengxr/SPARK-4001
7e69725 [Xiangrui Meng] simplify FPTree and update FPGrowth
ec21f7d [Jacky Li] fix scalastyle
93f3280 [Jacky Li] create FPTree class
d110ab2 [Jacky Li] change test case to use MLlibTestSparkContext
a6c5081 [Jacky Li] Add Parallel FPGrowth algorithm
eb3e4ca [Jacky Li] add FPGrowth
03df2b6 [Jacky Li] refactory according to comments
7b77ad7 [Jacky Li] fix scalastyle check
f68a0bd [Jacky Li] add 2 apriori implemenation and fp-growth implementation
889b33f [Jacky Li] modify per scalastyle check
da2cba7 [Jacky Li] adding apriori algorithm for frequent item set mining in Spark
Diffstat (limited to '.gitignore')
0 files changed, 0 insertions, 0 deletions