aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorVyacheslav Baranov <slavik.baranov@gmail.com>2015-08-27 18:56:18 +0100
committerSean Owen <sowen@cloudera.com>2015-08-27 18:56:18 +0100
commitfdd466bed7a7151dd066d732ef98d225f4acda4a (patch)
tree4d8291b830846f72ca89c59d1f04cb1a9ee8bb79 /mllib
parente1f4de4a7d15d4ca4b5c64ff929ac3980f5d706f (diff)
downloadspark-fdd466bed7a7151dd066d732ef98d225f4acda4a.tar.gz
spark-fdd466bed7a7151dd066d732ef98d225f4acda4a.tar.bz2
spark-fdd466bed7a7151dd066d732ef98d225f4acda4a.zip
[SPARK-10182] [MLLIB] GeneralizedLinearModel doesn't unpersist cached data
`GeneralizedLinearModel` creates a cached RDD when building a model. It's inconvenient, since these RDDs flood the memory when building several models in a row, so useful data might get evicted from the cache. The proposed solution is to always cache the dataset & remove the warning. There's a caveat though: input dataset gets evaluated twice, in line 270 when fitting `StandardScaler` for the first time, and when running optimizer for the second time. So, it might worth to return removed warning. Another possible solution is to disable caching entirely & return removed warning. I don't really know what approach is better. Author: Vyacheslav Baranov <slavik.baranov@gmail.com> Closes #8395 from SlavikBaranov/SPARK-10182.
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala5
1 files changed, 5 insertions, 0 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala
index 7e3b4d5648..8f657bfb9c 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala
@@ -359,6 +359,11 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel]
+ " parent RDDs are also uncached.")
}
+ // Unpersist cached data
+ if (data.getStorageLevel != StorageLevel.NONE) {
+ data.unpersist(false)
+ }
+
createModel(weights, intercept)
}
}