[SPARK-12644][SQL] Update parquet reader to be vectorized.

This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch. There are a few particulars in the Parquet encodings that make this much more efficient. In particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are also very suited for this. This is a work in progress and does not affect the current execution. In subsequent patches, we will support more encodings and types before enabling this. Simple benchmarks indicate this can decode single ints about > 3x faster. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10593 from nongli/spark-12644.
author: Nong Li <nong@databricks.com> 2016-01-15 17:40:26 -0800
committer: Reynold Xin <rxin@databricks.com> 2016-01-15 17:40:26 -0800
commit: 9039333c0a0ce4bea32f012b81c1e82e31246fc1 (patch)
tree: 6910f4dc9febb8edc68575c24c4f3496cd4b8d7c /core
parent: 3b5ccb12b8d33d99df0f206fecf00f51c2b88fdb (diff)
download: spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.gz
spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.bz2
spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.zip
1 files changed, 3 insertions, 3 deletions
diff --git a/core/src/main/scala/org/apache/spark/util/Benchmark.scala b/core/src/main/scala/org/apache/spark/util/Benchmark.scala
index 457a1a05a1..d484cec7ae 100644
--- a/core/src/main/scala/org/apache/spark/util/Benchmark.scala
+++ b/core/src/main/scala/org/apache/spark/util/Benchmark.scala
@@ -62,10 +62,10 @@ private[spark] class Benchmark(
     val firstRate = results.head.avgRate
     // The results are going to be processor specific so it is useful to include that.
     println(Benchmark.getProcessorName())
-    printf("%-24s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate")
-    println("-------------------------------------------------------------------------")
+    printf("%-30s %16s %16s %14s\n", name + ":", "Avg Time(ms)", "Avg Rate(M/s)", "Relative Rate")
+    println("-------------------------------------------------------------------------------")
     results.zip(benchmarks).foreach { r =>
-      printf("%-24s %16s %16s %14s\n",
+      printf("%-30s %16s %16s %14s\n",
         r._2.name,
         "%10.2f" format r._1.avgMs,
         "%10.2f" format r._1.avgRate,
author	Nong Li <nong@databricks.com>	2016-01-15 17:40:26 -0800
committer	Reynold Xin <rxin@databricks.com>	2016-01-15 17:40:26 -0800
commit	9039333c0a0ce4bea32f012b81c1e82e31246fc1 (patch)
tree	6910f4dc9febb8edc68575c24c4f3496cd4b8d7c /core
parent	3b5ccb12b8d33d99df0f206fecf00f51c2b88fdb (diff)
download	spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.gz spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.tar.bz2 spark-9039333c0a0ce4bea32f012b81c1e82e31246fc1.zip