diff options
author | Qifan Pu <qifan.pu@gmail.com> | 2016-08-10 14:45:13 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-08-10 14:45:13 -0700 |
commit | bf5cb8af4a649e0c7ac565891427484eab9ee5d9 (patch) | |
tree | 06b431129766ecb19c5d1c36999e3d3825383cda /docs/running-on-yarn.md | |
parent | 214ba66a030bc3a718c567a742b0db44bf911d61 (diff) | |
download | spark-bf5cb8af4a649e0c7ac565891427484eab9ee5d9.tar.gz spark-bf5cb8af4a649e0c7ac565891427484eab9ee5d9.tar.bz2 spark-bf5cb8af4a649e0c7ac565891427484eab9ee5d9.zip |
[SPARK-16928] [SQL] Recursive call of ColumnVector::getInt() breaks JIT inlining
## What changes were proposed in this pull request?
In both `OnHeapColumnVector` and `OffHeapColumnVector`, we implemented `getInt()` with the following code pattern:
```
public int getInt(int rowId) {
if (dictionary == null)
{ return intData[rowId]; }
else
{ return dictionary.decodeToInt(dictionaryIds.getInt(rowId)); }
}
```
As `dictionaryIds` is also a `ColumnVector`, this results in a recursive call of `getInt()` and breaks JIT inlining. As a result, `getInt()` will not get inlined.
We fix this by adding a separate method `getDictId()` specific for `dictionaryIds` to use.
## How was this patch tested?
We tested the difference with the following aggregate query on a TPCDS dataset (with scale factor = 5):
```
select
max(ss_sold_date_sk) as max_ss_sold_date_sk,
from store_sales
```
The query runtime is improved, from 202ms (before) to 159ms (after).
Author: Qifan Pu <qifan.pu@gmail.com>
Closes #14513 from ooq/SPARK-16928.
Diffstat (limited to 'docs/running-on-yarn.md')
0 files changed, 0 insertions, 0 deletions