aboutsummaryrefslogtreecommitdiff
path: root/streaming
diff options
context:
space:
mode:
authorAlexander Ulanov <nashb@yandex.ru>2015-07-31 11:22:40 -0700
committerXiangrui Meng <meng@databricks.com>2015-07-31 11:23:30 -0700
commit6add4eddb39e7748a87da3e921ea3c7881d30a82 (patch)
tree92fecca0d3008e5e537a78dcf87349b89a32a8dc /streaming
parent0024da9157ba12ec84883a78441fa6835c1d0042 (diff)
downloadspark-6add4eddb39e7748a87da3e921ea3c7881d30a82.tar.gz
spark-6add4eddb39e7748a87da3e921ea3c7881d30a82.tar.bz2
spark-6add4eddb39e7748a87da3e921ea3c7881d30a82.zip
[SPARK-9471] [ML] Multilayer Perceptron
This pull request contains the following feature for ML: - Multilayer Perceptron classifier This implementation is based on our initial pull request with bgreeven: https://github.com/apache/spark/pull/1290 and inspired by very insightful suggestions from mengxr and witgo (I would like to thank all other people from the mentioned thread for useful discussions). The original code was extensively tested and benchmarked. Since then, I've addressed two main requirements that prevented the code from merging into the main branch: - Extensible interface, so it will be easy to implement new types of networks - Main building blocks are traits `Layer` and `LayerModel`. They are used for constructing layers of ANN. New layers can be added by extending the `Layer` and `LayerModel` traits. These traits are private in this release in order to save path to improve them based on community feedback - Back propagation is implemented in general form, so there is no need to change it (optimization algorithm) when new layers are implemented - Speed and scalability: this implementation has to be comparable in terms of speed to the state of the art single node implementations. - The developed benchmark for large ANN shows that the proposed code is on par with C++ CPU implementation and scales nicely with the number of workers. Details can be found here: https://github.com/avulanov/ann-benchmark - DBN and RBM by witgo https://github.com/witgo/spark/tree/ann-interface-gemm-dbn - Dropout https://github.com/avulanov/spark/tree/ann-interface-gemm mengxr and dbtsai kindly agreed to perform code review. Author: Alexander Ulanov <nashb@yandex.ru> Author: Bert Greevenbosch <opensrc@bertgreevenbosch.nl> Closes #7621 from avulanov/SPARK-2352-ann and squashes the following commits: 4806b6f [Alexander Ulanov] Addressing reviewers comments. a7e7951 [Alexander Ulanov] Default blockSize: 100. Added documentation to blockSize parameter and DataStacker class f69bb3d [Alexander Ulanov] Addressing reviewers comments. 374bea6 [Alexander Ulanov] Moving ANN to ML package. GradientDescent constructor is now spark private. 43b0ae2 [Alexander Ulanov] Addressing reviewers comments. Adding multiclass test. 9d18469 [Alexander Ulanov] Addressing reviewers comments: unnecessary copy of data in predict 35125ab [Alexander Ulanov] Style fix in tests e191301 [Alexander Ulanov] Apache header a226133 [Alexander Ulanov] Multilayer Perceptron regressor and classifier
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions