aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorHerman van Hovell <hvanhovell@questtec.nl>2016-01-06 11:16:53 -0800
committerReynold Xin <rxin@databricks.com>2016-01-06 11:16:53 -0800
commitea489f14f11b2fdfb44c86634d2e2c2167b6ea18 (patch)
tree8b29d929015ed8337a8c485f7d88a81dbbfe0166 /project
parent3aa3488225af12a77da3ba807906bc6a461ef11c (diff)
downloadspark-ea489f14f11b2fdfb44c86634d2e2c2167b6ea18.tar.gz
spark-ea489f14f11b2fdfb44c86634d2e2c2167b6ea18.tar.bz2
spark-ea489f14f11b2fdfb44c86634d2e2c2167b6ea18.zip
[SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst
This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made: The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling. The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project: - ```CatalystQl```: This implements Query and Expression parsing functionality. - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe. - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive. cc rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10583 from hvanhovell/SPARK-12575.
Diffstat (limited to 'project')
-rw-r--r--project/SparkBuild.scala104
1 files changed, 56 insertions, 48 deletions
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index af1d36c6ea..5d4f19ab14 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -247,6 +247,9 @@ object SparkBuild extends PomBuild {
/* Enable unidoc only for the root spark project */
enable(Unidoc.settings)(spark)
+ /* Catalyst ANTLR generation settings */
+ enable(Catalyst.settings)(catalyst)
+
/* Spark SQL Core console settings */
enable(SQL.settings)(sql)
@@ -357,6 +360,58 @@ object OldDeps {
)
}
+object Catalyst {
+ lazy val settings = Seq(
+ // ANTLR code-generation step.
+ //
+ // This has been heavily inspired by com.github.stefri.sbt-antlr (0.5.3). It fixes a number of
+ // build errors in the current plugin.
+ // Create Parser from ANTLR grammar files.
+ sourceGenerators in Compile += Def.task {
+ val log = streams.value.log
+
+ val grammarFileNames = Seq(
+ "SparkSqlLexer.g",
+ "SparkSqlParser.g")
+ val sourceDir = (sourceDirectory in Compile).value / "antlr3"
+ val targetDir = (sourceManaged in Compile).value
+
+ // Create default ANTLR Tool.
+ val antlr = new org.antlr.Tool
+
+ // Setup input and output directories.
+ antlr.setInputDirectory(sourceDir.getPath)
+ antlr.setOutputDirectory(targetDir.getPath)
+ antlr.setForceRelativeOutput(true)
+ antlr.setMake(true)
+
+ // Add grammar files.
+ grammarFileNames.flatMap(gFileName => (sourceDir ** gFileName).get).foreach { gFilePath =>
+ val relGFilePath = (gFilePath relativeTo sourceDir).get.getPath
+ log.info("ANTLR: Grammar file '%s' detected.".format(relGFilePath))
+ antlr.addGrammarFile(relGFilePath)
+ // We will set library directory multiple times here. However, only the
+ // last one has effect. Because the grammar files are located under the same directory,
+ // We assume there is only one library directory.
+ antlr.setLibDirectory(gFilePath.getParent)
+ }
+
+ // Generate the parser.
+ antlr.process
+ if (antlr.getNumErrors > 0) {
+ log.error("ANTLR: Caught %d build errors.".format(antlr.getNumErrors))
+ }
+
+ // Return all generated java files.
+ (targetDir ** "*.java").get.toSeq
+ }.taskValue,
+ // Include ANTLR tokens files.
+ resourceGenerators in Compile += Def.task {
+ ((sourceManaged in Compile).value ** "*.tokens").get.toSeq
+ }.taskValue
+ )
+}
+
object SQL {
lazy val settings = Seq(
initialCommands in console :=
@@ -414,54 +469,7 @@ object Hive {
// Some of our log4j jars make it impossible to submit jobs from this JVM to Hive Map/Reduce
// in order to generate golden files. This is only required for developers who are adding new
// new query tests.
- fullClasspath in Test := (fullClasspath in Test).value.filterNot { f => f.toString.contains("jcl-over") },
- // ANTLR code-generation step.
- //
- // This has been heavily inspired by com.github.stefri.sbt-antlr (0.5.3). It fixes a number of
- // build errors in the current plugin.
- // Create Parser from ANTLR grammar files.
- sourceGenerators in Compile += Def.task {
- val log = streams.value.log
-
- val grammarFileNames = Seq(
- "SparkSqlLexer.g",
- "SparkSqlParser.g")
- val sourceDir = (sourceDirectory in Compile).value / "antlr3"
- val targetDir = (sourceManaged in Compile).value
-
- // Create default ANTLR Tool.
- val antlr = new org.antlr.Tool
-
- // Setup input and output directories.
- antlr.setInputDirectory(sourceDir.getPath)
- antlr.setOutputDirectory(targetDir.getPath)
- antlr.setForceRelativeOutput(true)
- antlr.setMake(true)
-
- // Add grammar files.
- grammarFileNames.flatMap(gFileName => (sourceDir ** gFileName).get).foreach { gFilePath =>
- val relGFilePath = (gFilePath relativeTo sourceDir).get.getPath
- log.info("ANTLR: Grammar file '%s' detected.".format(relGFilePath))
- antlr.addGrammarFile(relGFilePath)
- // We will set library directory multiple times here. However, only the
- // last one has effect. Because the grammar files are located under the same directory,
- // We assume there is only one library directory.
- antlr.setLibDirectory(gFilePath.getParent)
- }
-
- // Generate the parser.
- antlr.process
- if (antlr.getNumErrors > 0) {
- log.error("ANTLR: Caught %d build errors.".format(antlr.getNumErrors))
- }
-
- // Return all generated java files.
- (targetDir ** "*.java").get.toSeq
- }.taskValue,
- // Include ANTLR tokens files.
- resourceGenerators in Compile += Def.task {
- ((sourceManaged in Compile).value ** "*.tokens").get.toSeq
- }.taskValue
+ fullClasspath in Test := (fullClasspath in Test).value.filterNot { f => f.toString.contains("jcl-over") }
)
}