aboutsummaryrefslogtreecommitdiff
path: root/streaming/src/test
diff options
context:
space:
mode:
authorSameer Agarwal <sameer@databricks.com>2016-02-02 22:22:50 -0800
committerMichael Armbrust <michael@databricks.com>2016-02-02 22:22:50 -0800
commit138c300f97d29cb0d04a70bea98a8a0c0548318a (patch)
treec9271588249560b21cccf5dc92a618360d8a66be /streaming/src/test
parente86f8f63bfa3c15659b94e831b853b1bc9ddae32 (diff)
downloadspark-138c300f97d29cb0d04a70bea98a8a0c0548318a.tar.gz
spark-138c300f97d29cb0d04a70bea98a8a0c0548318a.tar.bz2
spark-138c300f97d29cb0d04a70bea98a8a0c0548318a.zip
[SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL
Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance, if a filter defines `‘a > 10`, we know that the output data of this filter satisfies 2 constraints: 1. `‘a > 10` 2. `isNotNull(‘a)` This PR proposes a possible way of keeping track of these constraints and propagating them in the logical plan, which can then help us build more advanced optimizations (such as pruning redundant filters, optimizing joins, among others). We define constraints as a set of (implicitly conjunctive) expressions. For e.g., if a filter operator has constraints = `Set(‘a > 10, ‘b < 100)`, it’s implied that the outputs satisfy both individual constraints (i.e., `‘a > 10` AND `‘b < 100`). Design Document: https://docs.google.com/a/databricks.com/document/d/1WQRgDurUBV9Y6CWOBS75PQIqJwT-6WftVa18xzm7nCo/edit?usp=sharing Author: Sameer Agarwal <sameer@databricks.com> Closes #10844 from sameeragarwal/constraints.
Diffstat (limited to 'streaming/src/test')
0 files changed, 0 insertions, 0 deletions