diff options
author | hyukjinkwon <gurwls223@gmail.com> | 2016-09-01 15:32:07 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2016-09-01 15:32:07 -0700 |
commit | d314677cfd9cb4140005765938841bae9dc48a2d (patch) | |
tree | cf7e637bc71fad5c04f3706645a46415d037129e /licenses | |
parent | e388bd54499cb4c26a0e14efd47af0c684ca250f (diff) | |
download | spark-d314677cfd9cb4140005765938841bae9dc48a2d.tar.gz spark-d314677cfd9cb4140005765938841bae9dc48a2d.tar.bz2 spark-d314677cfd9cb4140005765938841bae9dc48a2d.zip |
[SPARK-16461][SQL] Support partition batch pruning with `<=>` predicate in InMemoryTableScanExec
## What changes were proposed in this pull request?
It seems `EqualNullSafe` filter was missed for batch pruneing partitions in cached tables.
It seems supporting this improves the performance roughly 5 times faster.
Running the codes below:
```scala
test("Null-safe equal comparison") {
val N = 20000000
val df = spark.range(N).repartition(20)
val benchmark = new Benchmark("Null-safe equal comparison", N)
df.createOrReplaceTempView("t")
spark.catalog.cacheTable("t")
sql("select id from t where id <=> 1").collect()
benchmark.addCase("Null-safe equal comparison", 10) { _ =>
sql("select id from t where id <=> 1").collect()
}
benchmark.run()
}
```
produces the results below:
**Before:**
```
Running benchmark: Null-safe equal comparison
Running case: Null-safe equal comparison
Stopped after 10 iterations, 2098 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i7-4850HQ CPU 2.30GHz
Null-safe equal comparison: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Null-safe equal comparison 204 / 210 98.1 10.2 1.0X
```
**After:**
```
Running benchmark: Null-safe equal comparison
Running case: Null-safe equal comparison
Stopped after 10 iterations, 478 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i7-4850HQ CPU 2.30GHz
Null-safe equal comparison: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Null-safe equal comparison 42 / 48 474.1 2.1 1.0X
```
## How was this patch tested?
Unit tests in `PartitionBatchPruningSuite`.
Author: hyukjinkwon <gurwls223@gmail.com>
Closes #14117 from HyukjinKwon/SPARK-16461.
Diffstat (limited to 'licenses')
0 files changed, 0 insertions, 0 deletions