aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-07-26 22:56:07 -0700
committerReynold Xin <rxin@apache.org>2014-07-26 22:56:07 -0700
commitaaf2b735fddbebccd28012006ee4647af3b3624f (patch)
treeeb132ba2fa45cddaf7730628403e836afecb34e3 /python
parentb547f69bdb5f4a6d5f471a2d998c2df6fb2a9347 (diff)
downloadspark-aaf2b735fddbebccd28012006ee4647af3b3624f.tar.gz
spark-aaf2b735fddbebccd28012006ee4647af3b3624f.tar.bz2
spark-aaf2b735fddbebccd28012006ee4647af3b3624f.zip
[SPARK-2361][MLLIB] Use broadcast instead of serializing data directly into task closure
We saw task serialization problems with large feature dimension, which could be avoid if we don't serialize data directly into task but use broadcast variables. This PR uses broadcast in both training and prediction and adds tests to make sure the task size is small. Author: Xiangrui Meng <meng@databricks.com> Closes #1427 from mengxr/broadcast-new and squashes the following commits: b9a1228 [Xiangrui Meng] style update b97c184 [Xiangrui Meng] minimal change to LBFGS 9ebadcc [Xiangrui Meng] add task size test to RowMatrix 9427bf0 [Xiangrui Meng] add task size tests to linear methods e0a5cf2 [Xiangrui Meng] add task size test to GD 28a8411 [Xiangrui Meng] add test for NaiveBayes 380778c [Xiangrui Meng] update KMeans test bccab92 [Xiangrui Meng] add task size test to LBFGS 02103ba [Xiangrui Meng] remove print e73d68e [Xiangrui Meng] update tests for k-means 174cb15 [Xiangrui Meng] use local-cluster for test with a small akka.frameSize 1928a5a [Xiangrui Meng] add test for KMeans task size e00c2da [Xiangrui Meng] use broadcast in GD, KMeans 010d076 [Xiangrui Meng] modify NaiveBayesModel and GLM to use broadcast
Diffstat (limited to 'python')
0 files changed, 0 insertions, 0 deletions