diff options
author | sethah <seth.hendrickson16@gmail.com> | 2016-11-20 01:42:37 +0000 |
---|---|---|
committer | DB Tsai <dbtsai@dbtsai.com> | 2016-11-20 01:42:37 +0000 |
commit | 856e0042007c789dda4539fb19a5d4580999fbf4 (patch) | |
tree | 25c67679bce2bec591dd0f739ba265660a29c5af /assembly | |
parent | ea77c81ec0db27ea4709f71dc080d00167505a7d (diff) | |
download | spark-856e0042007c789dda4539fb19a5d4580999fbf4.tar.gz spark-856e0042007c789dda4539fb19a5d4580999fbf4.tar.bz2 spark-856e0042007c789dda4539fb19a5d4580999fbf4.zip |
[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training
## What changes were proposed in this pull request?
This is a follow up to some of the discussion [here](https://github.com/apache/spark/pull/15593). During LogisticRegression training, we store the coefficients combined with intercepts as a flat vector, but a more natural abstraction is a matrix. Here, we refactor the code to use matrix where possible, which makes the code more readable and greatly simplifies the indexing.
Note: We do not use a Breeze matrix for the cost function as was mentioned in the linked PR. This is because LBFGS/OWLQN require an implicit `MutableInnerProductModule[DenseMatrix[Double], Double]` which is not natively defined in Breeze. We would need to extend Breeze in Spark to define it ourselves. Also, we do not modify the `regParamL1Fun` because OWLQN in Breeze requires a `MutableEnumeratedCoordinateField[(Int, Int), DenseVector[Double]]` (since we still use a dense vector for coefficients). Here again we would have to extend Breeze inside Spark.
## How was this patch tested?
This is internal code refactoring - the current unit tests passing show us that the change did not break anything. No added functionality in this patch.
Author: sethah <seth.hendrickson16@gmail.com>
Closes #15893 from sethah/logreg_refactor.
Diffstat (limited to 'assembly')
0 files changed, 0 insertions, 0 deletions