diff options
author | Herman van Hovell <hvanhovell@databricks.com> | 2016-10-05 16:05:30 -0700 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2016-10-05 16:05:30 -0700 |
commit | 5fd54b994e2078dbf0794932b4e0ffa9a9eda0c3 (patch) | |
tree | c3578544cb4d4b4431a5debd0ce6ea7ff4334e0b /docs/_includes/nav-left-wrapper-ml.html | |
parent | 221b418b1c9db7b04c600b6300d18b034a4f444e (diff) | |
download | spark-5fd54b994e2078dbf0794932b4e0ffa9a9eda0c3.tar.gz spark-5fd54b994e2078dbf0794932b4e0ffa9a9eda0c3.tar.bz2 spark-5fd54b994e2078dbf0794932b4e0ffa9a9eda0c3.zip |
[SPARK-17758][SQL] Last returns wrong result in case of empty partition
## What changes were proposed in this pull request?
The result of the `Last` function can be wrong when the last partition processed is empty. It can return `null` instead of the expected value. For example, this can happen when we process partitions in the following order:
```
- Partition 1 [Row1, Row2]
- Partition 2 [Row3]
- Partition 3 []
```
In this case the `Last` function will currently return a null, instead of the value of `Row3`.
This PR fixes this by adding a `valueSet` flag to the `Last` function.
## How was this patch tested?
We only used end to end tests for `DeclarativeAggregateFunction`s. I have added an evaluator for these functions so we can tests them in catalyst. I have added a `LastTestSuite` to test the `Last` aggregate function.
Author: Herman van Hovell <hvanhovell@databricks.com>
Closes #15348 from hvanhovell/SPARK-17758.
Diffstat (limited to 'docs/_includes/nav-left-wrapper-ml.html')
0 files changed, 0 insertions, 0 deletions