diff options
author | Timothy Hunter <timhunter@databricks.com> | 2016-04-28 22:42:48 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2016-04-28 22:42:48 -0700 |
commit | 769a909d1357766a441ff69e6e98c22c51b12c93 (patch) | |
tree | d176f05a13eec69224cf9e084706dd4fac9da1e8 /R/pkg/inst/tests/testthat/test_context.R | |
parent | 4607f6e7f7b174c62700f1fe542f77af3203b096 (diff) | |
download | spark-769a909d1357766a441ff69e6e98c22c51b12c93.tar.gz spark-769a909d1357766a441ff69e6e98c22c51b12c93.tar.bz2 spark-769a909d1357766a441ff69e6e98c22c51b12c93.zip |
[SPARK-7264][ML] Parallel lapply for sparkR
## What changes were proposed in this pull request?
This PR adds a new function in SparkR called `sparkLapply(list, function)`. This function implements a distributed version of `lapply` using Spark as a backend.
TODO:
- [x] check documentation
- [ ] check tests
Trivial example in SparkR:
```R
sparkLapply(1:5, function(x) { 2 * x })
```
Output:
```
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 8
[[5]]
[1] 10
```
Here is a slightly more complex example to perform distributed training of multiple models. Under the hood, Spark broadcasts the dataset.
```R
library("MASS")
data(menarche)
families <- c("gaussian", "poisson")
train <- function(family){glm(Menarche ~ Age , family=family, data=menarche)}
results <- sparkLapply(families, train)
```
## How was this patch tested?
This PR was tested in SparkR. I am unfamiliar with R and SparkR, so any feedback on style, testing, etc. will be much appreciated.
cc falaki davies
Author: Timothy Hunter <timhunter@databricks.com>
Closes #12426 from thunterdb/7264.
Diffstat (limited to 'R/pkg/inst/tests/testthat/test_context.R')
-rw-r--r-- | R/pkg/inst/tests/testthat/test_context.R | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/R/pkg/inst/tests/testthat/test_context.R b/R/pkg/inst/tests/testthat/test_context.R index ffa067eb5e..ca04342cd5 100644 --- a/R/pkg/inst/tests/testthat/test_context.R +++ b/R/pkg/inst/tests/testthat/test_context.R @@ -141,3 +141,9 @@ test_that("sparkJars sparkPackages as comma-separated strings", { expect_that(processSparkJars(f), not(gives_warning())) expect_match(processSparkJars(f), f) }) + +test_that("spark.lapply should perform simple transforms", { + sc <- sparkR.init() + doubled <- spark.lapply(sc, 1:10, function(x) { 2 * x }) + expect_equal(doubled, as.list(2 * 1:10)) +}) |