[SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL - spark

diff options

author	Sameer Agarwal <sameer@databricks.com>	2016-02-02 22:22:50 -0800
committer	Michael Armbrust <michael@databricks.com>	2016-02-02 22:22:50 -0800
commit	138c300f97d29cb0d04a70bea98a8a0c0548318a (patch)
tree	c9271588249560b21cccf5dc92a618360d8a66be /streaming/src/test
parent	e86f8f63bfa3c15659b94e831b853b1bc9ddae32 (diff)
download	spark-138c300f97d29cb0d04a70bea98a8a0c0548318a.tar.gz spark-138c300f97d29cb0d04a70bea98a8a0c0548318a.tar.bz2 spark-138c300f97d29cb0d04a70bea98a8a0c0548318a.zip

[SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL

Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance, if a filter defines `‘a > 10`, we know that the output data of this filter satisfies 2 constraints: 1. `‘a > 10` 2. `isNotNull(‘a)` This PR proposes a possible way of keeping track of these constraints and propagating them in the logical plan, which can then help us build more advanced optimizations (such as pruning redundant filters, optimizing joins, among others). We define constraints as a set of (implicitly conjunctive) expressions. For e.g., if a filter operator has constraints = `Set(‘a > 10, ‘b < 100)`, it’s implied that the outputs satisfy both individual constraints (i.e., `‘a > 10` AND `‘b < 100`). Design Document: https://docs.google.com/a/databricks.com/document/d/1WQRgDurUBV9Y6CWOBS75PQIqJwT-6WftVa18xzm7nCo/edit?usp=sharing Author: Sameer Agarwal <sameer@databricks.com> Closes #10844 from sameeragarwal/constraints.

Diffstat (limited to 'streaming/src/test')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: