diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-11-06 14:11:37 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-11-06 14:11:37 +0000 |
commit | 340f09d100cb669bc6795f085aac6fa05630a076 (patch) | |
tree | 9c55ecdcea33432ed7ca9a76ca0be33d6be26268 /sql/core/src/test/resources/sql-tests/results/random.sql.out | |
parent | 23ce0d1e91076d90c1a87d698a94d283d08cf899 (diff) | |
download | spark-340f09d100cb669bc6795f085aac6fa05630a076.tar.gz spark-340f09d100cb669bc6795f085aac6fa05630a076.tar.bz2 spark-340f09d100cb669bc6795f085aac6fa05630a076.zip |
[SPARK-17854][SQL] rand/randn allows null/long as input seed
## What changes were proposed in this pull request?
This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and `LongType` as input in SQL. In this case, it treats the values as `0`.
So, this PR includes both changes below:
- `null` support
It seems MySQL also accepts this.
``` sql
mysql> select rand(0);
+---------------------+
| rand(0) |
+---------------------+
| 0.15522042769493574 |
+---------------------+
1 row in set (0.00 sec)
mysql> select rand(NULL);
+---------------------+
| rand(NULL) |
+---------------------+
| 0.15522042769493574 |
+---------------------+
1 row in set (0.00 sec)
```
and also Hive does according to [HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)
So the codes below:
``` scala
spark.range(1).selectExpr("rand(null)").show()
```
prints..
**Before**
```
Input argument to rand must be an integer literal.;; line 1 pos 0
org.apache.spark.sql.AnalysisException: Input argument to rand must be an integer literal.;; line 1 pos 0
at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444)
```
**After**
```
+-----------------------+
|rand(CAST(NULL AS INT))|
+-----------------------+
| 0.13385709732307427|
+-----------------------+
```
- `LongType` support in SQL.
In addition, it make the function allows to take `LongType` consistently within Scala/SQL.
In more details, the codes below:
``` scala
spark.range(1).select(rand(1), rand(1L)).show()
spark.range(1).selectExpr("rand(1)", "rand(1L)").show()
```
prints..
**Before**
```
+------------------+------------------+
| rand(1)| rand(1)|
+------------------+------------------+
|0.2630967864682161|0.2630967864682161|
+------------------+------------------+
Input argument to rand must be an integer literal.;; line 1 pos 0
org.apache.spark.sql.AnalysisException: Input argument to rand must be an integer literal.;; line 1 pos 0
at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
at
```
**After**
```
+------------------+------------------+
| rand(1)| rand(1)|
+------------------+------------------+
|0.2630967864682161|0.2630967864682161|
+------------------+------------------+
+------------------+------------------+
| rand(1)| rand(1)|
+------------------+------------------+
|0.2630967864682161|0.2630967864682161|
+------------------+------------------+
```
## How was this patch tested?
Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #15432 from HyukjinKwon/SPARK-17854.
Diffstat (limited to 'sql/core/src/test/resources/sql-tests/results/random.sql.out')
-rw-r--r-- | sql/core/src/test/resources/sql-tests/results/random.sql.out | 84 |
1 files changed, 84 insertions, 0 deletions
diff --git a/sql/core/src/test/resources/sql-tests/results/random.sql.out b/sql/core/src/test/resources/sql-tests/results/random.sql.out new file mode 100644 index 0000000000..bca67320fe --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/results/random.sql.out @@ -0,0 +1,84 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 10 + + +-- !query 0 +SELECT rand(0) +-- !query 0 schema +struct<rand(0):double> +-- !query 0 output +0.8446490682263027 + + +-- !query 1 +SELECT rand(cast(3 / 7 AS int)) +-- !query 1 schema +struct<rand(CAST((CAST(3 AS DOUBLE) / CAST(7 AS DOUBLE)) AS INT)):double> +-- !query 1 output +0.8446490682263027 + + +-- !query 2 +SELECT rand(NULL) +-- !query 2 schema +struct<rand(CAST(NULL AS INT)):double> +-- !query 2 output +0.8446490682263027 + + +-- !query 3 +SELECT rand(cast(NULL AS int)) +-- !query 3 schema +struct<rand(CAST(NULL AS INT)):double> +-- !query 3 output +0.8446490682263027 + + +-- !query 4 +SELECT rand(1.0) +-- !query 4 schema +struct<> +-- !query 4 output +org.apache.spark.sql.AnalysisException +cannot resolve 'rand(1.0BD)' due to data type mismatch: argument 1 requires (int or bigint) type, however, '1.0BD' is of decimal(2,1) type.; line 1 pos 7 + + +-- !query 5 +SELECT randn(0L) +-- !query 5 schema +struct<randn(0):double> +-- !query 5 output +1.1164209726833079 + + +-- !query 6 +SELECT randn(cast(3 / 7 AS long)) +-- !query 6 schema +struct<randn(CAST((CAST(3 AS DOUBLE) / CAST(7 AS DOUBLE)) AS BIGINT)):double> +-- !query 6 output +1.1164209726833079 + + +-- !query 7 +SELECT randn(NULL) +-- !query 7 schema +struct<randn(CAST(NULL AS INT)):double> +-- !query 7 output +1.1164209726833079 + + +-- !query 8 +SELECT randn(cast(NULL AS long)) +-- !query 8 schema +struct<randn(CAST(NULL AS BIGINT)):double> +-- !query 8 output +1.1164209726833079 + + +-- !query 9 +SELECT rand('1') +-- !query 9 schema +struct<> +-- !query 9 output +org.apache.spark.sql.AnalysisException +cannot resolve 'rand('1')' due to data type mismatch: argument 1 requires (int or bigint) type, however, ''1'' is of string type.; line 1 pos 7 |