diff options
author | Vyacheslav Baranov <slavik.baranov@gmail.com> | 2015-08-27 18:56:18 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2015-08-27 18:56:18 +0100 |
commit | fdd466bed7a7151dd066d732ef98d225f4acda4a (patch) | |
tree | 4d8291b830846f72ca89c59d1f04cb1a9ee8bb79 | |
parent | e1f4de4a7d15d4ca4b5c64ff929ac3980f5d706f (diff) | |
download | spark-fdd466bed7a7151dd066d732ef98d225f4acda4a.tar.gz spark-fdd466bed7a7151dd066d732ef98d225f4acda4a.tar.bz2 spark-fdd466bed7a7151dd066d732ef98d225f4acda4a.zip |
[SPARK-10182] [MLLIB] GeneralizedLinearModel doesn't unpersist cached data
`GeneralizedLinearModel` creates a cached RDD when building a model. It's inconvenient, since these RDDs flood the memory when building several models in a row, so useful data might get evicted from the cache.
The proposed solution is to always cache the dataset & remove the warning. There's a caveat though: input dataset gets evaluated twice, in line 270 when fitting `StandardScaler` for the first time, and when running optimizer for the second time. So, it might worth to return removed warning.
Another possible solution is to disable caching entirely & return removed warning. I don't really know what approach is better.
Author: Vyacheslav Baranov <slavik.baranov@gmail.com>
Closes #8395 from SlavikBaranov/SPARK-10182.
-rw-r--r-- | mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala index 7e3b4d5648..8f657bfb9c 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala @@ -359,6 +359,11 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel] + " parent RDDs are also uncached.") } + // Unpersist cached data + if (data.getStorageLevel != StorageLevel.NONE) { + data.unpersist(false) + } + createModel(weights, intercept) } } |