diff options
author | Yadong Qi <qiyadong2010@gmail.com> | 2016-05-28 10:19:29 -0700 |
---|---|---|
committer | Wenchen Fan <wenchen@databricks.com> | 2016-05-28 10:19:29 -0700 |
commit | b4c32c4952f7af2733258aa4e27f21e8832c8a3a (patch) | |
tree | f99524ff00497fa744b8adfba7bbb79c74e04e30 /R/pkg/inst/tests | |
parent | f1b220eeeed1d4d12121fe0b3b175da44488da68 (diff) | |
download | spark-b4c32c4952f7af2733258aa4e27f21e8832c8a3a.tar.gz spark-b4c32c4952f7af2733258aa4e27f21e8832c8a3a.tar.bz2 spark-b4c32c4952f7af2733258aa4e27f21e8832c8a3a.zip |
[SPARK-15549][SQL] Disable bucketing when the output doesn't contain all bucketing columns
## What changes were proposed in this pull request?
I create a bucketed table bucketed_table with bucket column i,
```scala
case class Data(i: Int, j: Int, k: Int)
sc.makeRDD(Array((1, 2, 3))).map(x => Data(x._1, x._2, x._3)).toDF.write.bucketBy(2, "i").saveAsTable("bucketed_table")
```
and I run the following SQLs:
```sql
SELECT j FROM bucketed_table;
Error in query: bucket column i not found in existing columns (j);
SELECT j, MAX(k) FROM bucketed_table GROUP BY j;
Error in query: bucket column i not found in existing columns (j, k);
```
I think we should add a check that, we only enable bucketing when it satisfies all conditions below:
1. the conf is enabled
2. the relation is bucketed
3. the output contains all bucketing columns
## How was this patch tested?
Updated test cases to reflect the changes.
Author: Yadong Qi <qiyadong2010@gmail.com>
Closes #13321 from watermen/SPARK-15549.
Diffstat (limited to 'R/pkg/inst/tests')
0 files changed, 0 insertions, 0 deletions