aboutsummaryrefslogtreecommitdiff
path: root/assembly
diff options
context:
space:
mode:
authorsethah <seth.hendrickson16@gmail.com>2016-11-20 01:42:37 +0000
committerDB Tsai <dbtsai@dbtsai.com>2016-11-20 01:42:37 +0000
commit856e0042007c789dda4539fb19a5d4580999fbf4 (patch)
tree25c67679bce2bec591dd0f739ba265660a29c5af /assembly
parentea77c81ec0db27ea4709f71dc080d00167505a7d (diff)
downloadspark-856e0042007c789dda4539fb19a5d4580999fbf4.tar.gz
spark-856e0042007c789dda4539fb19a5d4580999fbf4.tar.bz2
spark-856e0042007c789dda4539fb19a5d4580999fbf4.zip
[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training
## What changes were proposed in this pull request? This is a follow up to some of the discussion [here](https://github.com/apache/spark/pull/15593). During LogisticRegression training, we store the coefficients combined with intercepts as a flat vector, but a more natural abstraction is a matrix. Here, we refactor the code to use matrix where possible, which makes the code more readable and greatly simplifies the indexing. Note: We do not use a Breeze matrix for the cost function as was mentioned in the linked PR. This is because LBFGS/OWLQN require an implicit `MutableInnerProductModule[DenseMatrix[Double], Double]` which is not natively defined in Breeze. We would need to extend Breeze in Spark to define it ourselves. Also, we do not modify the `regParamL1Fun` because OWLQN in Breeze requires a `MutableEnumeratedCoordinateField[(Int, Int), DenseVector[Double]]` (since we still use a dense vector for coefficients). Here again we would have to extend Breeze inside Spark. ## How was this patch tested? This is internal code refactoring - the current unit tests passing show us that the change did not break anything. No added functionality in this patch. Author: sethah <seth.hendrickson16@gmail.com> Closes #15893 from sethah/logreg_refactor.
Diffstat (limited to 'assembly')
0 files changed, 0 insertions, 0 deletions