aboutsummaryrefslogtreecommitdiff
path: root/tools
diff options
context:
space:
mode:
authorSameer Agarwal <sameer@databricks.com>2014-06-11 12:01:04 -0700
committerMichael Armbrust <michael@databricks.com>2014-06-11 12:01:04 -0700
commit4107cce58c41160a0dc20339621eacdf8a8b1191 (patch)
treece7fce598c61190f702e8baed2517d7efc873e0e /tools
parent4d5c12aa1c54c49377a4bafe3bcc4993d5e1a552 (diff)
downloadspark-4107cce58c41160a0dc20339621eacdf8a8b1191.tar.gz
spark-4107cce58c41160a0dc20339621eacdf8a8b1191.tar.bz2
spark-4107cce58c41160a0dc20339621eacdf8a8b1191.zip
[SPARK-2042] Prevent unnecessary shuffle triggered by take()
This PR implements `take()` on a `SchemaRDD` by inserting a logical limit that is followed by a `collect()`. This is also accompanied by adding a catalyst optimizer rule for collapsing adjacent limits. Doing so prevents an unnecessary shuffle that is sometimes triggered by `take()`. Author: Sameer Agarwal <sameer@databricks.com> Closes #1048 from sameeragarwal/master and squashes the following commits: 3eeb848 [Sameer Agarwal] Fixing Tests 1b76ff1 [Sameer Agarwal] Deprecating limit(limitExpr: Expression) in v1.1.0 b723ac4 [Sameer Agarwal] Added limit folding tests a0ff7c4 [Sameer Agarwal] Adding catalyst rule to fold two consecutive limits 8d42d03 [Sameer Agarwal] Implement trigger() as limit() followed by collect()
Diffstat (limited to 'tools')
0 files changed, 0 insertions, 0 deletions