diff options
author | Cheng Lian <lian.cs.zju@gmail.com> | 2014-09-03 18:57:20 -0700 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-09-03 18:57:20 -0700 |
commit | f48420fde58d554480cc8830d2f8c4d17618f283 (patch) | |
tree | bd793ba5bc1e9917f34a7f40daf697346c447393 /sql/catalyst/src/main/scala/org/apache | |
parent | 4bba10c41acaf84a1c4a8e2db467c22f5ab7cbb9 (diff) | |
download | spark-f48420fde58d554480cc8830d2f8c4d17618f283.tar.gz spark-f48420fde58d554480cc8830d2f8c4d17618f283.tar.bz2 spark-f48420fde58d554480cc8830d2f8c4d17618f283.zip |
[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect()
By overriding `executeCollect()` in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. `sql("SET").collect()`.
Previously, `Command.sideEffectResult` returns a `Seq[Any]`, and the `execute()` method in sub-classes of `Command` typically convert that to a `Seq[Row]` then parallelize it to an RDD. Now with this PR, `sideEffectResult` is required to return a `Seq[Row]` directly, so that `executeCollect()` can directly leverage that and be factored to the `Command` parent class.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2215 from liancheng/lightweight-commands and squashes the following commits:
3fbef60 [Cheng Lian] Factored execute() method of physical commands to parent class Command
5a0e16c [Cheng Lian] Passes test suites
e0e12e9 [Cheng Lian] Refactored Command.sideEffectResult and Command.executeCollect
995bdd8 [Cheng Lian] Cleaned up DescribeHiveTableCommand
542977c [Cheng Lian] Avoids confusion between logical and physical plan by adding package prefixes
55b2aa5 [Cheng Lian] Avoids distributed jobs when execution SQL commands
Diffstat (limited to 'sql/catalyst/src/main/scala/org/apache')
0 files changed, 0 insertions, 0 deletions