diff options
author | Nong Li <nong@databricks.com> | 2016-01-15 17:40:26 -0800 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2016-01-15 17:40:26 -0800 |
commit | 9039333c0a0ce4bea32f012b81c1e82e31246fc1 (patch) | |
tree | 6910f4dc9febb8edc68575c24c4f3496cd4b8d7c /core | |
parent | 3b5ccb12b8d33d99df0f206fecf00f51c2b88fdb (diff) | |
download | spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.gz spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.bz2 spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.zip |
[SPARK-12644][SQL] Update parquet reader to be vectorized.
This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch.
There are a few particulars in the Parquet encodings that make this much more efficient. In
particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are
also very suited for this.
This is a work in progress and does not affect the current execution. In subsequent patches, we will
support more encodings and types before enabling this.
Simple benchmarks indicate this can decode single ints about > 3x faster.
Author: Nong Li <nong@databricks.com>
Author: Nong <nongli@gmail.com>
Closes #10593 from nongli/spark-12644.
Diffstat (limited to 'core')
-rw-r--r-- | core/src/main/scala/org/apache/spark/util/Benchmark.scala | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/core/src/main/scala/org/apache/spark/util/Benchmark.scala b/core/src/main/scala/org/apache/spark/util/Benchmark.scala index 457a1a05a1..d484cec7ae 100644 --- a/core/src/main/scala/org/apache/spark/util/Benchmark.scala +++ b/core/src/main/scala/org/apache/spark/util/Benchmark.scala @@ -62,10 +62,10 @@ private[spark] class Benchmark( val firstRate = results.head.avgRate // The results are going to be processor specific so it is useful to include that. println(Benchmark.getProcessorName()) - printf("%-24s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate") - println("-------------------------------------------------------------------------") + printf("%-30s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate") + println("-------------------------------------------------------------------------------") results.zip(benchmarks).foreach { r => - printf("%-24s %16s %16s %14s\n", + printf("%-30s %16s %16s %14s\n", r._2.name, "%10.2f" format r._1.avgMs, "%10.2f" format r._1.avgRate, |