aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorNong Li <nong@databricks.com>2016-01-15 17:40:26 -0800
committerReynold Xin <rxin@databricks.com>2016-01-15 17:40:26 -0800
commit9039333c0a0ce4bea32f012b81c1e82e31246fc1 (patch)
tree6910f4dc9febb8edc68575c24c4f3496cd4b8d7c /core
parent3b5ccb12b8d33d99df0f206fecf00f51c2b88fdb (diff)
downloadspark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.gz
spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.bz2
spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.zip
[SPARK-12644][SQL] Update parquet reader to be vectorized.
This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch. There are a few particulars in the Parquet encodings that make this much more efficient. In particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are also very suited for this. This is a work in progress and does not affect the current execution. In subsequent patches, we will support more encodings and types before enabling this. Simple benchmarks indicate this can decode single ints about > 3x faster. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10593 from nongli/spark-12644.
Diffstat (limited to 'core')
-rw-r--r--core/src/main/scala/org/apache/spark/util/Benchmark.scala6
1 files changed, 3 insertions, 3 deletions
diff --git a/core/src/main/scala/org/apache/spark/util/Benchmark.scala b/core/src/main/scala/org/apache/spark/util/Benchmark.scala
index 457a1a05a1..d484cec7ae 100644
--- a/core/src/main/scala/org/apache/spark/util/Benchmark.scala
+++ b/core/src/main/scala/org/apache/spark/util/Benchmark.scala
@@ -62,10 +62,10 @@ private[spark] class Benchmark(
val firstRate = results.head.avgRate
// The results are going to be processor specific so it is useful to include that.
println(Benchmark.getProcessorName())
- printf("%-24s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate")
- println("-------------------------------------------------------------------------")
+ printf("%-30s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate")
+ println("-------------------------------------------------------------------------------")
results.zip(benchmarks).foreach { r =>
- printf("%-24s %16s %16s %14s\n",
+ printf("%-30s %16s %16s %14s\n",
r._2.name,
"%10.2f" format r._1.avgMs,
"%10.2f" format r._1.avgRate,