aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-custom-receivers.md
diff options
context:
space:
mode:
authorsethah <seth.hendrickson16@gmail.com>2016-08-08 00:00:15 -0700
committerDB Tsai <dbt@netflix.com>2016-08-08 00:00:15 -0700
commit1db1c6567bae0c80fdc522f2cbb65557cd62263f (patch)
tree493f86413f3e7fe5248b95fb270aee5a7739be32 /docs/streaming-custom-receivers.md
parente076fb05ac83a3ed6995e29bb03ea07ea05e39db (diff)
downloadspark-1db1c6567bae0c80fdc522f2cbb65557cd62263f.tar.gz
spark-1db1c6567bae0c80fdc522f2cbb65557cd62263f.tar.bz2
spark-1db1c6567bae0c80fdc522f2cbb65557cd62263f.zip
[SPARK-16404][ML] LeastSquaresAggregators serializes unnecessary data
## What changes were proposed in this pull request? Similar to `LogisticAggregator`, `LeastSquaresAggregator` used for linear regression ends up serializing the coefficients and the features standard deviations, which is not necessary and can cause performance issues for high dimensional data. This patch removes this serialization. In https://github.com/apache/spark/pull/13729 the approach was to pass these values directly to the add method. The approach used here, initially, is to mark these fields as transient instead which gives the benefit of keeping the signature of the add method simple and interpretable. The downside is that it requires the use of `transient lazy val`s which are difficult to reason about if one is not quite familiar with serialization in Scala/Spark. ## How was this patch tested? **MLlib** ![image](https://cloud.githubusercontent.com/assets/7275795/16703660/436f79fa-4524-11e6-9022-ef00058ec718.png) **ML without patch** ![image](https://cloud.githubusercontent.com/assets/7275795/16703831/c4d50b9e-4525-11e6-80cb-9b58c850cd41.png) **ML with patch** ![image](https://cloud.githubusercontent.com/assets/7275795/16703675/63e0cf40-4524-11e6-9120-1f512a70e083.png) Author: sethah <seth.hendrickson16@gmail.com> Closes #14109 from sethah/LIR_serialize.
Diffstat (limited to 'docs/streaming-custom-receivers.md')
0 files changed, 0 insertions, 0 deletions