diff options
author | gatorsmile <gatorsmile@gmail.com> | 2016-05-03 23:20:18 +0200 |
---|---|---|
committer | Herman van Hovell <hvanhovell@questtec.nl> | 2016-05-03 23:20:18 +0200 |
commit | 71296c041e59159bd7c5836cf652c02843974077 (patch) | |
tree | b57b74bac7083bdf1cb352840d9da609051a2e46 /docs/css | |
parent | 2e2a6211c4391d67edb2a252f26647fb059bc18b (diff) | |
download | spark-71296c041e59159bd7c5836cf652c02843974077.tar.gz spark-71296c041e59159bd7c5836cf652c02843974077.tar.bz2 spark-71296c041e59159bd7c5836cf652c02843974077.zip |
[SPARK-15056][SQL] Parse Unsupported Sampling Syntax and Issue Better Exceptions
#### What changes were proposed in this pull request?
Compared with the current Spark parser, there are two extra syntax are supported in Hive for sampling
- In `On` clauses, `rand()` is used for indicating sampling on the entire row instead of an individual column. For example,
```SQL
SELECT * FROM source TABLESAMPLE(BUCKET 3 OUT OF 32 ON rand()) s;
```
- Users can specify the total length to be read. For example,
```SQL
SELECT * FROM source TABLESAMPLE(100M) s;
```
Below is the link for references:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
This PR is to parse and capture these two extra syntax, and issue a better error message.
#### How was this patch tested?
Added test cases to verify the thrown exceptions
Author: gatorsmile <gatorsmile@gmail.com>
Closes #12838 from gatorsmile/bucketOnRand.
Diffstat (limited to 'docs/css')
0 files changed, 0 insertions, 0 deletions