[SPARK-16429][SQL] Include `StringType` columns in `describe()`

## What changes were proposed in this pull request? Currently, Spark `describe` supports `StringType`. However, `describe()` returns a dataset for only all numeric columns. This PR aims to include `StringType` columns in `describe()`, `describe` without argument. **Background** ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe("age", "name").show() +-------+------------------+-------+ |summary| age| name| +-------+------------------+-------+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| | min| 19| Andy| | max| 30|Michael| +-------+------------------+-------+ ``` **Before** ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +-------+------------------+ |summary| age| +-------+------------------+ | count| 2| | mean| 24.5| | stddev|7.7781745930520225| | min| 19| | max| 30| +-------+------------------+ ``` **After** ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +-------+------------------+-------+ |summary| age| name| +-------+------------------+-------+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| | min| 19| Andy| | max| 30|Michael| +-------+------------------+-------+ ``` ## How was this patch tested? Pass the Jenkins with a update testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14095 from dongjoon-hyun/SPARK-16429.
author: Dongjoon Hyun <dongjoon@apache.org> 2016-07-08 14:36:50 -0700
committer: Reynold Xin <rxin@databricks.com> 2016-07-08 14:36:50 -0700
commit: 142df4834bc33dc7b84b626c6ee3508ab1abe015 (patch)
tree: 04eab461749ee26103eec7869e4f91eefd4d1b44 /R/pkg/inst/tests/testthat/test_sparkSQL.R
parent: 67e085ef6dd62774095f3187844c091db1a6a72c (diff)
download: spark-142df4834bc33dc7b84b626c6ee3508ab1abe015.tar.gz
spark-142df4834bc33dc7b84b626c6ee3508ab1abe015.tar.bz2
spark-142df4834bc33dc7b84b626c6ee3508ab1abe015.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R
index e2a1da0f1e..fdd6020db9 100644
--- a/R/pkg/inst/tests/testthat/test_sparkSQL.R
+++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R
@@ -1824,11 +1824,11 @@ test_that("describe() and summarize() on a DataFrame", {
   expect_equal(collect(stats)[2, "age"], "24.5")
   expect_equal(collect(stats)[3, "age"], "7.7781745930520225")
   stats <- describe(df)
-  expect_equal(collect(stats)[4, "name"], NULL)
+  expect_equal(collect(stats)[4, "name"], "Andy")
   expect_equal(collect(stats)[5, "age"], "30")
 
   stats2 <- summary(df)
-  expect_equal(collect(stats2)[4, "name"], NULL)
+  expect_equal(collect(stats2)[4, "name"], "Andy")
   expect_equal(collect(stats2)[5, "age"], "30")
 
   # SPARK-16425: SparkR summary() fails on column of type logical
author	Dongjoon Hyun <dongjoon@apache.org>	2016-07-08 14:36:50 -0700
committer	Reynold Xin <rxin@databricks.com>	2016-07-08 14:36:50 -0700
commit	142df4834bc33dc7b84b626c6ee3508ab1abe015 (patch)
tree	04eab461749ee26103eec7869e4f91eefd4d1b44 /R/pkg/inst/tests/testthat/test_sparkSQL.R
parent	67e085ef6dd62774095f3187844c091db1a6a72c (diff)
download	spark-142df4834bc33dc7b84b626c6ee3508ab1abe015.tar.gz spark-142df4834bc33dc7b84b626c6ee3508ab1abe015.tar.bz2 spark-142df4834bc33dc7b84b626c6ee3508ab1abe015.zip