diff options
author | Yash Datta <Yash.Datta@guavus.com> | 2014-10-30 17:17:24 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-10-30 17:17:31 -0700 |
commit | 2e35e24294ad8a5e76c89ea888fe330052dabd5a (patch) | |
tree | 4a04c807efa3e346e07aeba52593a20a745284a7 /pom.xml | |
parent | 9b6ebe33db27be38c3036ffeda17096043fb0fb9 (diff) | |
download | spark-2e35e24294ad8a5e76c89ea888fe330052dabd5a.tar.gz spark-2e35e24294ad8a5e76c89ea888fe330052dabd5a.tar.bz2 spark-2e35e24294ad8a5e76c89ea888fe330052dabd5a.zip |
[SPARK-3968][SQL] Use parquet-mr filter2 api
The parquet-mr project has introduced a new filter api (https://github.com/apache/incubator-parquet-mr/pull/4), along with several fixes . It can also eliminate entire RowGroups depending on certain statistics like min/max
We can leverage that to further improve performance of queries with filters.
Also filter2 api introduces ability to create custom filters. We can create a custom filter for the optimized In clause (InSet) , so that elimination happens in the ParquetRecordReader itself
Author: Yash Datta <Yash.Datta@guavus.com>
Closes #2841 from saucam/master and squashes the following commits:
8282ba0 [Yash Datta] SPARK-3968: fix scala code style and add some more tests for filtering on optional columns
515df1c [Yash Datta] SPARK-3968: Add a test case for filter pushdown on optional column
5f4530e [Yash Datta] SPARK-3968: Fix scala code style
f304667 [Yash Datta] SPARK-3968: Using task metadata strategy for row group filtering
ec53e92 [Yash Datta] SPARK-3968: No push down should result in case we are unable to create a record filter
48163c3 [Yash Datta] SPARK-3968: Code cleanup
cc7b596 [Yash Datta] SPARK-3968: 1. Fix RowGroupFiltering not working 2. Use the serialization/deserialization from Parquet library for filter pushdown
caed851 [Yash Datta] Revert "SPARK-3968: Not pushing the filters in case of OPTIONAL columns" since filtering on optional columns is now supported in filter2 api
49703c9 [Yash Datta] SPARK-3968: Not pushing the filters in case of OPTIONAL columns
9d09741 [Yash Datta] SPARK-3968: Change parquet filter pushdown to use filter2 api of parquet-mr
Diffstat (limited to 'pom.xml')
-rw-r--r-- | pom.xml | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -133,7 +133,7 @@ <!-- Version used for internal directory structure --> <hive.version.short>0.13.1</hive.version.short> <derby.version>10.10.1.1</derby.version> - <parquet.version>1.4.3</parquet.version> + <parquet.version>1.6.0rc3</parquet.version> <jblas.version>1.2.3</jblas.version> <jetty.version>8.1.14.v20131031</jetty.version> <chill.version>0.3.6</chill.version> |