aboutsummaryrefslogtreecommitdiff
path: root/make-distribution.sh
diff options
context:
space:
mode:
authorPeter Rudenko <petro.rudenko@gmail.com>2015-02-16 00:07:23 -0800
committerXiangrui Meng <meng@databricks.com>2015-02-16 00:07:23 -0800
commitd51d6ba1547ae75ac76c9e6d8ea99e937eb7d09f (patch)
tree5c33800c77fad824ad451a988cf4f8ee706dbb43 /make-distribution.sh
parentc78a12c4cc4d4312c4ee1069d3b218882d32d678 (diff)
downloadspark-d51d6ba1547ae75ac76c9e6d8ea99e937eb7d09f.tar.gz
spark-d51d6ba1547ae75ac76c9e6d8ea99e937eb7d09f.tar.bz2
spark-d51d6ba1547ae75ac76c9e6d8ea99e937eb7d09f.zip
[Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
On a big dataset explicitly unpersist train and validation folds allows to load more data into memory in the next loop iteration. On my environment (single node 8Gb worker RAM, 2 GB dataset file, 3 folds for cross validation), saved more than 5 minutes. Author: Peter Rudenko <petro.rudenko@gmail.com> Closes #4595 from petro-rudenko/patch-2 and squashes the following commits: 66a7cfb [Peter Rudenko] Move validationDataset cache to declaration c5f3265 [Peter Rudenko] [Ml] SPARK-5804 Explicitly manage cache in Crossvalidator k-fold loop
Diffstat (limited to 'make-distribution.sh')
0 files changed, 0 insertions, 0 deletions