diff options
author | Xiangrui Meng <meng@databricks.com> | 2015-07-22 21:40:23 -0700 |
---|---|---|
committer | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2015-07-22 21:40:23 -0700 |
commit | 2f5cbd860e487e7339e627dd7e2c9baa5116b819 (patch) | |
tree | cc6d860bcf45f3541b60f46e19cc64103cb7acb8 /R/pkg/inst/tests/test_sparkSQL.R | |
parent | b217230f2a96c6d5a0554c593bdf1d1374878688 (diff) | |
download | spark-2f5cbd860e487e7339e627dd7e2c9baa5116b819.tar.gz spark-2f5cbd860e487e7339e627dd7e2c9baa5116b819.tar.bz2 spark-2f5cbd860e487e7339e627dd7e2c9baa5116b819.zip |
[SPARK-8364] [SPARKR] Add crosstab to SparkR DataFrames
Add `crosstab` to SparkR DataFrames, which takes two column names and returns a local R data.frame. This is similar to `table` in R. However, `table` in SparkR is used for loading SQL tables as DataFrames. The return type is data.frame instead table for `crosstab` to be compatible with Scala/Python.
I couldn't run R tests successfully on my local. Many unit tests failed. So let's try Jenkins.
Author: Xiangrui Meng <meng@databricks.com>
Closes #7318 from mengxr/SPARK-8364 and squashes the following commits:
d75e894 [Xiangrui Meng] fix tests
53f6ddd [Xiangrui Meng] fix tests
f1348d6 [Xiangrui Meng] update test
47cb088 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8364
5621262 [Xiangrui Meng] first version without test
Diffstat (limited to 'R/pkg/inst/tests/test_sparkSQL.R')
-rw-r--r-- | R/pkg/inst/tests/test_sparkSQL.R | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/R/pkg/inst/tests/test_sparkSQL.R b/R/pkg/inst/tests/test_sparkSQL.R index a3039d36c9..62fe48a5d6 100644 --- a/R/pkg/inst/tests/test_sparkSQL.R +++ b/R/pkg/inst/tests/test_sparkSQL.R @@ -987,6 +987,19 @@ test_that("fillna() on a DataFrame", { expect_identical(expected, actual) }) +test_that("crosstab() on a DataFrame", { + rdd <- lapply(parallelize(sc, 0:3), function(x) { + list(paste0("a", x %% 3), paste0("b", x %% 2)) + }) + df <- toDF(rdd, list("a", "b")) + ct <- crosstab(df, "a", "b") + ordered <- ct[order(ct$a_b),] + row.names(ordered) <- NULL + expected <- data.frame("a_b" = c("a0", "a1", "a2"), "b0" = c(1, 0, 1), "b1" = c(1, 1, 0), + stringsAsFactors = FALSE, row.names = NULL) + expect_identical(expected, ordered) +}) + unlink(parquetPath) unlink(jsonPath) unlink(jsonPathNa) |