diff options
author | Alexander Ulanov <nashb@yandex.ru> | 2015-07-31 11:22:40 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-07-31 11:23:30 -0700 |
commit | 6add4eddb39e7748a87da3e921ea3c7881d30a82 (patch) | |
tree | 92fecca0d3008e5e537a78dcf87349b89a32a8dc /streaming | |
parent | 0024da9157ba12ec84883a78441fa6835c1d0042 (diff) | |
download | spark-6add4eddb39e7748a87da3e921ea3c7881d30a82.tar.gz spark-6add4eddb39e7748a87da3e921ea3c7881d30a82.tar.bz2 spark-6add4eddb39e7748a87da3e921ea3c7881d30a82.zip |
[SPARK-9471] [ML] Multilayer Perceptron
This pull request contains the following feature for ML:
- Multilayer Perceptron classifier
This implementation is based on our initial pull request with bgreeven: https://github.com/apache/spark/pull/1290 and inspired by very insightful suggestions from mengxr and witgo (I would like to thank all other people from the mentioned thread for useful discussions). The original code was extensively tested and benchmarked. Since then, I've addressed two main requirements that prevented the code from merging into the main branch:
- Extensible interface, so it will be easy to implement new types of networks
- Main building blocks are traits `Layer` and `LayerModel`. They are used for constructing layers of ANN. New layers can be added by extending the `Layer` and `LayerModel` traits. These traits are private in this release in order to save path to improve them based on community feedback
- Back propagation is implemented in general form, so there is no need to change it (optimization algorithm) when new layers are implemented
- Speed and scalability: this implementation has to be comparable in terms of speed to the state of the art single node implementations.
- The developed benchmark for large ANN shows that the proposed code is on par with C++ CPU implementation and scales nicely with the number of workers. Details can be found here: https://github.com/avulanov/ann-benchmark
- DBN and RBM by witgo https://github.com/witgo/spark/tree/ann-interface-gemm-dbn
- Dropout https://github.com/avulanov/spark/tree/ann-interface-gemm
mengxr and dbtsai kindly agreed to perform code review.
Author: Alexander Ulanov <nashb@yandex.ru>
Author: Bert Greevenbosch <opensrc@bertgreevenbosch.nl>
Closes #7621 from avulanov/SPARK-2352-ann and squashes the following commits:
4806b6f [Alexander Ulanov] Addressing reviewers comments.
a7e7951 [Alexander Ulanov] Default blockSize: 100. Added documentation to blockSize parameter and DataStacker class
f69bb3d [Alexander Ulanov] Addressing reviewers comments.
374bea6 [Alexander Ulanov] Moving ANN to ML package. GradientDescent constructor is now spark private.
43b0ae2 [Alexander Ulanov] Addressing reviewers comments. Adding multiclass test.
9d18469 [Alexander Ulanov] Addressing reviewers comments: unnecessary copy of data in predict
35125ab [Alexander Ulanov] Style fix in tests
e191301 [Alexander Ulanov] Apache header
a226133 [Alexander Ulanov] Multilayer Perceptron regressor and classifier
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions