[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect() - spark

diff options

author	Cheng Lian <lian.cs.zju@gmail.com>	2014-09-03 18:57:20 -0700
committer	Michael Armbrust <michael@databricks.com>	2014-09-03 18:57:20 -0700
commit	f48420fde58d554480cc8830d2f8c4d17618f283 (patch)
tree	bd793ba5bc1e9917f34a7f40daf697346c447393 /sql/catalyst/src/main/scala/org/apache
parent	4bba10c41acaf84a1c4a8e2db467c22f5ab7cbb9 (diff)
download	spark-f48420fde58d554480cc8830d2f8c4d17618f283.tar.gz spark-f48420fde58d554480cc8830d2f8c4d17618f283.tar.bz2 spark-f48420fde58d554480cc8830d2f8c4d17618f283.zip

[SPARK-2973][SQL] Lightweight SQL commands without distributed jobs when calling .collect()

By overriding `executeCollect()` in physical plan classes of all commands, we can avoid to kick off a distributed job when collecting result of a SQL command, e.g. `sql("SET").collect()`. Previously, `Command.sideEffectResult` returns a `Seq[Any]`, and the `execute()` method in sub-classes of `Command` typically convert that to a `Seq[Row]` then parallelize it to an RDD. Now with this PR, `sideEffectResult` is required to return a `Seq[Row]` directly, so that `executeCollect()` can directly leverage that and be factored to the `Command` parent class. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2215 from liancheng/lightweight-commands and squashes the following commits: 3fbef60 [Cheng Lian] Factored execute() method of physical commands to parent class Command 5a0e16c [Cheng Lian] Passes test suites e0e12e9 [Cheng Lian] Refactored Command.sideEffectResult and Command.executeCollect 995bdd8 [Cheng Lian] Cleaned up DescribeHiveTableCommand 542977c [Cheng Lian] Avoids confusion between logical and physical plan by adding package prefixes 55b2aa5 [Cheng Lian] Avoids distributed jobs when execution SQL commands

Diffstat (limited to 'sql/catalyst/src/main/scala/org/apache')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: