diff options
author | Yanbo Liang <ybliang8@gmail.com> | 2016-01-28 14:29:47 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2016-01-28 14:29:47 -0800 |
commit | df78a934a07a4ce5af43243be9ba5fe60b91eee6 (patch) | |
tree | ded777a2fddc1d9798d92424e0c23a4a39f2074f /docker-integration-tests | |
parent | cc18a7199240bf3b03410c1ba6704fe7ce6ae38e (diff) | |
download | spark-df78a934a07a4ce5af43243be9ba5fe60b91eee6.tar.gz spark-df78a934a07a4ce5af43243be9ba5fe60b91eee6.tar.bz2 spark-df78a934a07a4ce5af43243be9ba5fe60b91eee6.zip |
[SPARK-9835][ML] Implement IterativelyReweightedLeastSquares solver
Implement ```IterativelyReweightedLeastSquares``` solver for GLM. I consider it as a solver rather than estimator, it only used internal so I keep it ```private[ml]```.
There are two limitations in the current implementation compared with R:
* It can not support ```Tuple``` as response for ```Binomial``` family, such as the following code:
```
glm( cbind(using, notUsing) ~ age + education + wantsMore , family = binomial)
```
* It does not support ```offset```.
Because I considered that ```RFormula``` did not support ```Tuple``` as label and ```offset``` keyword, so I simplified the implementation. But to add support for these two functions is not very hard, I can do it in follow-up PR if it is necessary. Meanwhile, we can also add R-like statistic summary for IRLS.
The implementation refers R, [statsmodels](https://github.com/statsmodels/statsmodels) and [sparkGLM](https://github.com/AlteryxLabs/sparkGLM).
Please focus on the main structure and overpass minor issues/docs that I will update later. Any comments and opinions will be appreciated.
cc mengxr jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10639 from yanboliang/spark-9835.
Diffstat (limited to 'docker-integration-tests')
0 files changed, 0 insertions, 0 deletions