diff options
author | Sameer Agarwal <sameer@databricks.com> | 2014-06-11 12:01:04 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-06-11 12:01:04 -0700 |
commit | 4107cce58c41160a0dc20339621eacdf8a8b1191 (patch) | |
tree | ce7fce598c61190f702e8baed2517d7efc873e0e /bin/spark-class.cmd | |
parent | 4d5c12aa1c54c49377a4bafe3bcc4993d5e1a552 (diff) | |
download | spark-4107cce58c41160a0dc20339621eacdf8a8b1191.tar.gz spark-4107cce58c41160a0dc20339621eacdf8a8b1191.tar.bz2 spark-4107cce58c41160a0dc20339621eacdf8a8b1191.zip |
[SPARK-2042] Prevent unnecessary shuffle triggered by take()
This PR implements `take()` on a `SchemaRDD` by inserting a logical limit that is followed by a `collect()`. This is also accompanied by adding a catalyst optimizer rule for collapsing adjacent limits. Doing so prevents an unnecessary shuffle that is sometimes triggered by `take()`.
Author: Sameer Agarwal <sameer@databricks.com>
Closes #1048 from sameeragarwal/master and squashes the following commits:
3eeb848 [Sameer Agarwal] Fixing Tests
1b76ff1 [Sameer Agarwal] Deprecating limit(limitExpr: Expression) in v1.1.0
b723ac4 [Sameer Agarwal] Added limit folding tests
a0ff7c4 [Sameer Agarwal] Adding catalyst rule to fold two consecutive limits
8d42d03 [Sameer Agarwal] Implement trigger() as limit() followed by collect()
Diffstat (limited to 'bin/spark-class.cmd')
0 files changed, 0 insertions, 0 deletions