aboutsummaryrefslogtreecommitdiff
path: root/sql/hive
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-14551][SQL] Reduce number of NameNode calls in OrcRelationRajesh Balamohan2016-04-222-11/+108
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? When FileSourceStrategy is used, record reader is created which incurs a NN call internally. Later in OrcRelation.unwrapOrcStructs, it ends ups reading the file information to get the ObjectInspector. This incurs additional NN call. It would be good to avoid this additional NN call (specifically for partitioned datasets). Added OrcRecordReader which is very similar to OrcNewInputFormat.OrcRecordReader with an option of exposing the ObjectInspector. This eliminates the need to look up the file later for generating the object inspector. This would be specifically be useful for partitioned tables/datasets. ## How was this patch tested? Ran tpc-ds queries manually and also verified by running org.apache.spark.sql.hive.orc.OrcSuite,org.apache.spark.sql.hive.orc.OrcQuerySuite,org.apache.spark.sql.hive.orc.OrcPartitionDiscoverySuite,OrcPartitionDiscoverySuite.OrcHadoopFsRelationSuite,org.apache.spark.sql.hive.execution.HiveCompatibilitySuite …SourceStrategy mode Author: Rajesh Balamohan <rbalamohan@apache.org> Closes #12319 from rajeshbalamohan/SPARK-14551.
* [SPARK-14866][SQL] Break SQLQuerySuite out into smaller test suitesReynold Xin2016-04-223-509/+569
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch breaks SQLQuerySuite out into smaller test suites. It was a little bit too large for debugging. ## How was this patch tested? This is a test only change. Author: Reynold Xin <rxin@databricks.com> Closes #12630 from rxin/SPARK-14866.
* [SPARK-14842][SQL] Implement view creation in sql/coreReynold Xin2016-04-225-172/+5
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch re-implements view creation command in sql/core, based on the pre-existing view creation command in the Hive module. This consolidates the view creation logical command and physical command into a single one, called CreateViewCommand. ## How was this patch tested? All the code should've been tested by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12615 from rxin/SPARK-14842-2.
* [SPARK-14855][SQL] Add "Exec" suffix to physical operatorsReynold Xin2016-04-2215-42/+42
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch adds "Exec" suffix to all physical operators. Before this patch, Spark's physical operators and logical operators are named the same (e.g. Project could be logical.Project or execution.Project), which caused small issues in code review and bigger issues in code refactoring. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #12617 from rxin/exec-node.
* [SPARK-14841][SQL] Move SQLBuilder into sql/coreReynold Xin2016-04-227-546/+12
| | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch moves SQLBuilder into sql/core so we can in the future move view generation also into sql/core. ## How was this patch tested? Also moved unit tests. Author: Reynold Xin <rxin@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #12602 from rxin/SPARK-14841.
* [SPARK-14609][SQL] Native support for LOAD DATA DDL commandLiang-Chi Hsieh2016-04-223-1/+193
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Add the native support for LOAD DATA DDL command that loads data into Hive table/partition. ## How was this patch tested? `HiveDDLCommandSuite` and `HiveQuerySuite`. Besides, few Hive tests (`WindowQuerySuite`, `HiveTableScanSuite` and `HiveSerDeSuite`) also use `LOAD DATA` command. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Closes #12412 from viirya/ddl-load-data.
* [SPARK-14826][SQL] Remove HiveQueryExecutionReynold Xin2016-04-228-194/+39
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch removes HiveQueryExecution. As part of this, I consolidated all the describe commands into DescribeTableCommand. ## How was this patch tested? Should be covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12588 from rxin/SPARK-14826.
* [SPARK-14835][SQL] Remove MetastoreRelation dependency from SQLBuilderReynold Xin2016-04-211-4/+6
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch removes SQLBuilder's dependency on MetastoreRelation. We should be able to move SQLBuilder into the sql/core package after this change. ## How was this patch tested? N/A - covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12594 from rxin/SPARK-14835.
* [SPARK-14369] [SQL] Locality support for FileScanRDDCheng Lian2016-04-211-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | (This PR is a rebased version of PR #12153.) ## What changes were proposed in this pull request? This PR adds preliminary locality support for `FileFormat` data sources by overriding `FileScanRDD.preferredLocations()`. The strategy can be divided into two parts: 1. Block location lookup Unlike `HadoopRDD` or `NewHadoopRDD`, `FileScanRDD` doesn't have access to the underlying `InputFormat` or `InputSplit`, and thus can't rely on `InputSplit.getLocations()` to gather locality information. Instead, this PR queries block locations using `FileSystem.getBlockLocations()` after listing all `FileStatus`es in `HDFSFileCatalog` and convert all `FileStatus`es into `LocatedFileStatus`es. Note that although S3/S3A/S3N file systems don't provide valid locality information, their `getLocatedStatus()` implementations don't actually issue remote calls either. So there's no need to special case these file systems. 2. Selecting preferred locations For each `FilePartition`, we pick up top 3 locations that containing the most data to be retrieved. This isn't necessarily the best algorithm out there. Further improvements may be brought up in follow-up PRs. ## How was this patch tested? Tested by overriding default `FileSystem` implementation for `file:///` with a mocked one, which returns mocked block locations. Author: Cheng Lian <lian@databricks.com> Closes #12527 from liancheng/spark-14369-locality-rebased.
* [SPARK-14824][SQL] Rename HiveContext object to HiveUtilsAndrew Or2016-04-2115-44/+44
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Just a rename so we can get rid of `HiveContext.scala`. Note that this will conflict with #12585. ## How was this patch tested? No change in functionality. Author: Andrew Or <andrew@databricks.com> Closes #12586 from andrewor14/rename-hc-object.
* [SPARK-14821][SQL] Implement AnalyzeTable in sql/core and remove ↵Reynold Xin2016-04-217-216/+37
| | | | | | | | | | | | | | | | HiveSqlAstBuilder ## What changes were proposed in this pull request? This patch moves analyze table parsing into SparkSqlAstBuilder and removes HiveSqlAstBuilder. In order to avoid extensive refactoring, I created a common trait for CatalogRelation and MetastoreRelation, and match on that. In the future we should probably just consolidate the two into a single thing so we don't need this common trait. ## How was this patch tested? Updated unit tests. Author: Reynold Xin <rxin@databricks.com> Closes #12584 from rxin/SPARK-14821.
* [SPARK-14798][SQL] Move native command and script transformation parsing ↵Reynold Xin2016-04-2111-185/+33
| | | | | | | | | | | | | | into SparkSqlAstBuilder ## What changes were proposed in this pull request? This patch moves native command and script transformation into SparkSqlAstBuilder. This builds on #12561. See the last commit for diff. ## How was this patch tested? Updated test cases to reflect this. Author: Reynold Xin <rxin@databricks.com> Closes #12564 from rxin/SPARK-14798.
* [SPARK-14801][SQL] Move MetastoreRelation to its own fileReynold Xin2016-04-212-205/+232
| | | | | | | | | | | | ## What changes were proposed in this pull request? This class is currently in HiveMetastoreCatalog.scala, which is a large file that makes refactoring and searching of usage difficult. Moving it out so I can then do SPARK-14799 and make the review of that simpler. ## How was this patch tested? N/A - this is a straightforward move and should be covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12567 from rxin/SPARK-14801.
* [SPARK-14795][SQL] Remove the use of Hive's variable substitutionReynold Xin2016-04-213-11/+8
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch builds on #12556 and completely removes the use of Hive's variable substitution. ## How was this patch tested? Covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12561 from rxin/SPARK-14795.
* [SPARK-14799][SQL] Remove MetastoreRelation dependency from AnalyzeTable - ↵Reynold Xin2016-04-211-26/+23
| | | | | | | | | | | | | | part 1 ## What changes were proposed in this pull request? This patch isolates AnalyzeTable's dependency on MetastoreRelation into a single line. After this we can work on converging MetastoreRelation and CatalogTable. ## How was this patch tested? Covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12566 from rxin/SPARK-14799.
* [SPARK-14783] Preserve full exception stacktrace in IsolatedClientLoaderJosh Rosen2016-04-211-1/+1
| | | | | | | | In IsolatedClientLoader, we have a`catch` block which throws an exception without wrapping the original exception, causing the full exception stacktrace and any nested exceptions to be lost. This patch fixes this, improving the usefulness of classloading error messages. Author: Josh Rosen <joshrosen@databricks.com> Closes #12548 from JoshRosen/improve-logging-for-hive-classloader-issues.
* [SPARK-14794][SQL] Don't pass analyze command into HiveReynold Xin2016-04-212-6/+8
| | | | | | | | | | | | ## What changes were proposed in this pull request? We shouldn't pass analyze command to Hive because some of those would require running MapReduce jobs. For now, let's just always run the no scan analyze. ## How was this patch tested? Updated test case to reflect this change. Author: Reynold Xin <rxin@databricks.com> Closes #12558 from rxin/parser-analyze.
* [HOTFIX] Disable flaky testsReynold Xin2016-04-211-2/+2
|
* [SPARK-14792][SQL] Move as many parsing rules as possible into SQL parserReynold Xin2016-04-215-457/+14
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch moves as many parsing rules as possible into SQL parser. There are only three more left after this patch: (1) run native command, (2) analyze, and (3) script IO. These 3 will be dealt with in a follow-up PR. ## How was this patch tested? No test change. This simply moves code around. Author: Reynold Xin <rxin@databricks.com> Closes #12556 from rxin/SPARK-14792.
* [SPARK-14786] Remove hive-cli dependency from hive subprojectJosh Rosen2016-04-203-7/+33
| | | | | | | | | | | | The `hive` subproject currently depends on `hive-cli` in order to perform a check to see whether a `SessionState` is an instance of `org.apache.hadoop.hive.cli.CliSessionState` (see #9589). The introduction of this `hive-cli` dependency has caused problems for users whose Hive metastore JAR classpaths don't include the `hive-cli` classes (such as in #11495). This patch removes this dependency on `hive-cli` and replaces the `isInstanceOf` check by reflection. I added a Maven Enforcer rule to ban `hive-cli` from the `hive` subproject in order to make sure that this dependency is not accidentally reintroduced. /cc rxin yhuai adrian-wang preecet Author: Josh Rosen <joshrosen@databricks.com> Closes #12551 from JoshRosen/remove-hive-cli-dep-from-hive-subproject.
* [SPARK-14782][SPARK-14778][SQL] Remove HiveConf dependency from ↵Reynold Xin2016-04-203-39/+26
| | | | | | | | | | | | | | | | HiveSqlAstBuilder ## What changes were proposed in this pull request? The patch removes HiveConf dependency from HiveSqlAstBuilder. This is required in order to merge HiveSqlParser and SparkSqlAstBuilder, which would require getting rid of the Hive specific dependencies in HiveSqlParser. This patch also accomplishes [SPARK-14778] Remove HiveSessionState.substitutor. ## How was this patch tested? This should be covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #12550 from rxin/SPARK-14782.
* [SPARK-14775][SQL] Remove TestHiveSparkSession.rewritePathsReynold Xin2016-04-204-22/+17
| | | | | | | | | | | | ## What changes were proposed in this pull request? The path rewrite in TestHiveSparkSession is pretty hacky. I think we can remove those complexity and just do a string replacement when we read the query files in. This would remove the overloading of runNativeSql in TestHive, which will simplify the removal of Hive specific variable substitution. ## How was this patch tested? This is a small test refactoring to simplify test infrastructure. Author: Reynold Xin <rxin@databricks.com> Closes #12543 from rxin/SPARK-14775.
* [SPARK-14770][SQL] Remove unused queries in hive module test resourcesReynold Xin2016-04-20690-5352/+0
| | | | | | | | | | | | ## What changes were proposed in this pull request? We currently have five folders in queries: clientcompare, clientnegative, clientpositive, negative, and positive. Only clientpositive is used. We can remove the rest. ## How was this patch tested? N/A - removing unused test resources. Author: Reynold Xin <rxin@databricks.com> Closes #12540 from rxin/SPARK-14770.
* [SPARK-14720][SPARK-13643] Move Hive-specific methods into HiveSessionState ↵Andrew Or2016-04-2034-511/+601
| | | | | | | | | | | | | | | | | | | and Create a SparkSession class ## What changes were proposed in this pull request? This PR has two main changes. 1. Move Hive-specific methods from HiveContext to HiveSessionState, which help the work of removing HiveContext. 2. Create a SparkSession Class, which will later be the entry point of Spark SQL users. ## How was this patch tested? Existing tests This PR is trying to fix test failures of https://github.com/apache/spark/pull/12485. Author: Andrew Or <andrew@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12522 from yhuai/spark-session.
* [MINOR] [SQL] Re-enable `explode()` and `json_tuple()` testcases in ↵Dongjoon Hyun2016-04-191-4/+2
| | | | | | | | | | | | | | | | ExpressionToSQLSuite ## What changes were proposed in this pull request? Since [SPARK-12719: SQL Generation supports for generators](https://issues.apache.org/jira/browse/SPARK-12719) was resolved, this PR enables the related testcases: `explode()` and `json_tuple()`. ## How was this patch tested? Pass the Jenkins tests (with re-enabled test cases). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12329 from dongjoon-hyun/minor_enable_testcases.
* [SPARK-14600] [SQL] Push predicates through ExpandWenchen Fan2016-04-191-5/+9
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14600 This PR makes `Expand.output` have different attributes from the grouping attributes produced by the underlying `Project`, as they have different meaning, so that we can safely push down filter through `Expand` ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12496 from cloud-fan/expand.
* [SPARK-13929] Use Scala reflection for UDTsJoan2016-04-192-2/+2
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Enable ScalaReflection and User Defined Types for plain Scala classes. This involves the move of `schemaFor` from `ScalaReflection` trait (which is Runtime and Compile time (macros) reflection) to the `ScalaReflection` object (runtime reflection only) as I believe this code wouldn't work at compile time anyway as it manipulates `Class`'s that are not compiled yet. ## How was this patch tested? Unit test Author: Joan <joan@goyeau.com> Closes #12149 from joan38/SPARK-13929-Scala-reflection.
* [SPARK-14407][SQL] Hides HadoopFsRelation related data source API into ↵Cheng Lian2016-04-199-14/+12
| | | | | | | | | | | | | | | | | | | execution/datasources package #12178 ## What changes were proposed in this pull request? This PR moves `HadoopFsRelation` related data source API into `execution/datasources` package. Note that to avoid conflicts, this PR is based on #12153. Effective changes for this PR only consist of the last three commits. Will rebase after merging #12153. ## How was this patch tested? Existing tests. Author: Yin Huai <yhuai@databricks.com> Author: Cheng Lian <lian@databricks.com> Closes #12361 from liancheng/spark-14407-hide-hadoop-fs-relation.
* [SPARK-13681][SPARK-14458][SPARK-14566][SQL] Add back once removed ↵Cheng Lian2016-04-196-6/+391
| | | | | | | | | | | | | | | | | | | | | | | | | | | | CommitFailureTestRelationSuite and SimpleTextHadoopFsRelationSuite ## What changes were proposed in this pull request? These test suites were removed while refactoring `HadoopFsRelation` related API. This PR brings them back. This PR also fixes two regressions: - SPARK-14458, which causes runtime error when saving partitioned tables using `FileFormat` data sources that are not able to infer their own schemata. This bug wasn't detected by any built-in data sources because all of them happen to have schema inference feature. - SPARK-14566, which happens to be covered by SPARK-14458 and causes wrong query result or runtime error when - appending a Dataset `ds` to a persisted partitioned data source relation `t`, and - partition columns in `ds` don't all appear after data columns ## How was this patch tested? `CommitFailureTestRelationSuite` uses a testing relation that always fails when committing write tasks to test write job cleanup. `SimpleTextHadoopFsRelationSuite` uses a testing relation to test general `HadoopFsRelation` and `FileFormat` interfaces. The two regressions are both covered by existing test cases. Author: Cheng Lian <lian@databricks.com> Closes #12179 from liancheng/spark-13681-commit-failure-test.
* [SPARK-14674][SQL] Move HiveContext.hiveconf to HiveSessionStateAndrew Or2016-04-1816-66/+55
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This is just cleanup. This allows us to remove HiveContext later without inflating the diff too much. This PR fixes the conflicts of https://github.com/apache/spark/pull/12431. It also removes the `def hiveConf` from `HiveSqlParser`. So, we will pass the HiveConf associated with a session explicitly instead of relying on Hive's `SessionState` to pass `HiveConf`. ## How was this patch tested? Existing tests. Closes #12431 Author: Andrew Or <andrew@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12449 from yhuai/hiveconf.
* [SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedStateAndrew Or2016-04-185-103/+114
| | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch adds a SharedState that groups state shared across multiple SQLContexts. This is analogous to the SessionState added in SPARK-13526 that groups session-specific state. This cleanup makes the constructors of the contexts simpler and ultimately allows us to remove HiveContext in the near future. ## How was this patch tested? Existing tests. Author: Yin Huai <yhuai@databricks.com> Closes #12463 from yhuai/sharedState.
* Revert "[SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedState"Andrew Or2016-04-175-114/+103
| | | | This reverts commit 5cefecc95a5b8418713516802c416cfde5a94a2d.
* [SPARK-14672][SQL] Move HiveContext analyze logic to AnalyzeTableAndrew Or2016-04-162-78/+81
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Move the implementation of `hiveContext.analyze` to the command of `AnalyzeTable`. ## How was this patch tested? Existing tests. Closes #12429 Author: Yin Huai <yhuai@databricks.com> Author: Andrew Or <andrew@databricks.com> Closes #12448 from yhuai/analyzeTable.
* [SPARK-14647][SQL] Group SQLContext/HiveContext state into SharedStateAndrew Or2016-04-165-103/+114
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch adds a SharedState that groups state shared across multiple SQLContexts. This is analogous to the SessionState added in SPARK-13526 that groups session-specific state. This cleanup makes the constructors of the contexts simpler and ultimately allows us to remove HiveContext in the near future. ## How was this patch tested? Existing tests. Closes #12405 Author: Andrew Or <andrew@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12447 from yhuai/sharedState.
* [MINOR] Remove inappropriate type notation and extra anonymous closure ↵hyukjinkwon2016-04-161-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | within functional transformations ## What changes were proposed in this pull request? This PR removes - Inappropriate type notations For example, from ```scala words.foreachRDD { (rdd: RDD[String], time: Time) => ... ``` to ```scala words.foreachRDD { (rdd, time) => ... ``` - Extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` and corrects some obvious style nits. ## How was this patch tested? This was tested after adding rules in `scalastyle-config.xml`, which ended up with not finding all perfectly. The rules applied were below: - For the first correction, ```xml <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">(?m)\.[a-zA-Z_][a-zA-Z0-9]*\(\s*[^,]+s*=>\s*\{[^\}]+\}\s*\)</parameter></parameters> </check> ``` ```xml <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]*\s*[\{|\(]([^\n>,]+=>)?\s*\{([^()]|(?R))*\}^[,]</parameter></parameters> </check> ``` - For the second correction ```xml <check customId="TypeNotation" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]*\s*[\{|\(]\s*\([^):]*:R))*\}^[,]</parameter></parameters> </check> ``` **Those rules were not added** Author: hyukjinkwon <gurwls223@gmail.com> Closes #12413 from HyukjinKwon/SPARK-style.
* [SPARK-14668][SQL] Move CurrentDatabase to CatalystYin Huai2016-04-151-18/+0
| | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR moves `CurrentDatabase` from sql/hive package to sql/catalyst. It also adds the function description, which looks like the following. ``` scala> sqlContext.sql("describe function extended current_database").collect.foreach(println) [Function: current_database] [Class: org.apache.spark.sql.execution.command.CurrentDatabase] [Usage: current_database() - Returns the current database.] [Extended Usage: > SELECT current_database()] ``` ## How was this patch tested? Existing tests Author: Yin Huai <yhuai@databricks.com> Closes #12424 from yhuai/SPARK-14668.
* [SPARK-14447][SQL] Speed up TungstenAggregate w/ keys using VectorizedHashMapSameer Agarwal2016-04-141-21/+26
| | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch speeds up group-by aggregates by around 3-5x by leveraging an in-memory `AggregateHashMap` (please see https://github.com/apache/spark/pull/12161), an append-only aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the `BytesToBytesMap` if a given key isn't found). Architecturally, it is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the key-value pairs. The index lookups in the array rely on linear probing (with a small number of maximum tries) and use an inexpensive hash function which makes it really efficient for a majority of lookups. However, using linear probing and an inexpensive hash function also makes it less robust as compared to the `BytesToBytesMap` (especially for a large number of keys or even for certain distribution of keys) and requires us to fall back on the latter for correctness. ## How was this patch tested? Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4 Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz Aggregate w keys: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------- codegen = F 2124 / 2204 9.9 101.3 1.0X codegen = T hashmap = F 1198 / 1364 17.5 57.1 1.8X codegen = T hashmap = T 369 / 600 56.8 17.6 5.8X Author: Sameer Agarwal <sameer@databricks.com> Closes #12345 from sameeragarwal/tungsten-aggregate-integration.
* [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assemblyMark Grover2016-04-141-1/+1
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.
* [SPARK-14592][SQL] Native support for CREATE TABLE LIKE DDL commandLiang-Chi Hsieh2016-04-143-3/+38
| | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-14592 This patch adds native support for DDL command `CREATE TABLE LIKE`. The SQL syntax is like: CREATE TABLE table_name LIKE existing_table CREATE TABLE IF NOT EXISTS table_name LIKE existing_table ## How was this patch tested? `HiveDDLCommandSuite`. `HiveQuerySuite` already tests `CREATE TABLE LIKE`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> This patch had conflicts when merged, resolved by Committer: Andrew Or <andrew@databricks.com> Closes #12362 from viirya/create-table-like.
* [SPARK-14499][SQL][TEST] Drop Partition Does Not Delete Data of External Tablesgatorsmile2016-04-141-0/+67
| | | | | | | | | | | | | | | | | #### What changes were proposed in this pull request? This PR is to add a test to ensure drop partitions of an external table will not delete data. cc yhuai andrewor14 #### How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> This patch had conflicts when merged, resolved by Committer: Andrew Or <andrew@databricks.com> Closes #12350 from gatorsmile/testDropPartition.
* [SPARK-14125][SQL] Native DDL Support: Alter Viewgatorsmile2016-04-141-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #### What changes were proposed in this pull request? This PR is to provide a native DDL support for the following three Alter View commands: Based on the Hive DDL document: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL ##### 1. ALTER VIEW RENAME **Syntax:** ```SQL ALTER VIEW view_name RENAME TO new_view_name ``` - to change the name of a view to a different name - not allowed to rename a view's name by ALTER TABLE ##### 2. ALTER VIEW SET TBLPROPERTIES **Syntax:** ```SQL ALTER VIEW view_name SET TBLPROPERTIES ('comment' = new_comment); ``` - to add metadata to a view - not allowed to set views' properties by ALTER TABLE - ignore it if trying to set a view's existing property key when the value is the same - overwrite the value if trying to set a view's existing key to a different value ##### 3. ALTER VIEW UNSET TBLPROPERTIES **Syntax:** ```SQL ALTER VIEW view_name UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key') ``` - to remove metadata from a view - not allowed to unset views' properties by ALTER TABLE - issue an exception if trying to unset a view's non-existent key #### How was this patch tested? Added test cases to verify if it works properly. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #12324 from gatorsmile/alterView.
* [SPARK-14518][SQL] Support Comment in CREATE VIEWgatorsmile2016-04-143-19/+24
| | | | | | | | | | | | | | | | | | | | | | #### What changes were proposed in this pull request? **HQL Syntax**: [Create View](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView ) ```SQL CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ] [COMMENT view_comment] [TBLPROPERTIES (property_name = property_value, ...)] AS SELECT ...; ``` Add a support for the `[COMMENT view_comment]` clause #### How was this patch tested? Modified the existing test cases to verify the correctness. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #12288 from gatorsmile/addCommentInCreateView.
* [MINOR][SQL] Remove extra anonymous closure within functional transformationshyukjinkwon2016-04-141-16/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR removes extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` ## How was this patch tested? Related unit tests and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12382 from HyukjinKwon/minor-extra-closers.
* [SPARK-14388][SQL] Implement CREATE TABLEAndrew Or2016-04-139-187/+451
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch implements the `CREATE TABLE` command using the `SessionCatalog`. Previously we handled only `CTAS` and `CREATE TABLE ... USING`. This requires us to refactor `CatalogTable` to accept various fields (e.g. bucket and skew columns) and pass them to Hive. WIP: Note that I haven't verified whether this actually works yet! But I believe it does. ## How was this patch tested? Tests will come in a future commit. Author: Andrew Or <andrew@databricks.com> Author: Yin Huai <yhuai@databricks.com> Closes #12271 from andrewor14/create-table-ddl.
* [MINOR][SQL] Remove some unused imports in datasources.hyukjinkwon2016-04-131-2/+0
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? It looks several recent commits for datasources (maybe while removing old `HadoopFsRelation` interface) missed removing some unused imports. This PR removes some unused imports in datasources. ## How was this patch tested? `sbt scalastyle` and some unit tests for them. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12326 from HyukjinKwon/minor-imports.
* [SPARK-14414][SQL] improve the error message class hierarchybomeng2016-04-122-4/+0
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Before we are using `AnalysisException`, `ParseException`, `NoSuchFunctionException` etc when a parsing error encounters. I am trying to make it consistent and also **minimum** code impact to the current implementation by changing the class hierarchy. 1. `NoSuchItemException` is removed, since it is an abstract class and it just simply takes a message string. 2. `NoSuchDatabaseException`, `NoSuchTableException`, `NoSuchPartitionException` and `NoSuchFunctionException` now extends `AnalysisException`, as well as `ParseException`, they are all under `AnalysisException` umbrella, but you can also determine how to use them in a granular way. ## How was this patch tested? The existing test cases should cover this patch. Author: bomeng <bmeng@us.ibm.com> Closes #12314 from bomeng/SPARK-14414.
* [SPARK-14488][SPARK-14493][SQL] "CREATE TEMPORARY TABLE ... USING ... AS ↵Cheng Lian2016-04-122-6/+53
| | | | | | | | | | | | | | | | | | SELECT" shouldn't create persisted table ## What changes were proposed in this pull request? When planning logical plan node `CreateTableUsingAsSelect`, we neglected its `temporary` field and always generates a `CreateMetastoreDataSourceAsSelect`. This PR fixes this issue generating `CreateTempTableUsingAsSelect` when `temporary` is true. This PR also fixes SPARK-14493 since the root cause of SPARK-14493 is that we were `CreateMetastoreDataSourceAsSelect` uses default Hive warehouse location when `PATH` data source option is absent. ## How was this patch tested? Added a test case to create a temporary table using the target syntax and check whether it's indeed a temporary table. Author: Cheng Lian <lian@databricks.com> Closes #12303 from liancheng/spark-14488-fix-ctas-using.
* [SPARK-14535][SQL] Remove buildInternalScan from FileFormatWenchen Fan2016-04-111-13/+0
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Now `HadoopFsRelation` with all kinds of file formats can be handled in `FileSourceStrategy`, we can remove the branches for `HadoopFsRelation` in `FileSourceStrategy` and the `buildInternalScan` API from `FileFormat`. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #12300 from cloud-fan/remove.
* [SPARK-14132][SPARK-14133][SQL] Alter table partition DDLsAndrew Or2016-04-115-32/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This implements a few alter table partition commands using the `SessionCatalog`. In particular: ``` ALTER TABLE ... ADD PARTITION ... ALTER TABLE ... DROP PARTITION ... ALTER TABLE ... RENAME PARTITION ... TO ... ``` The following operations are not supported, and an `AnalysisException` with a helpful error message will be thrown if the user tries to use them: ``` ALTER TABLE ... EXCHANGE PARTITION ... ALTER TABLE ... ARCHIVE PARTITION ... ALTER TABLE ... UNARCHIVE PARTITION ... ALTER TABLE ... TOUCH ... ALTER TABLE ... COMPACT ... ALTER TABLE ... CONCATENATE MSCK REPAIR TABLE ... ``` ## How was this patch tested? `DDLSuite`, `DDLCommandSuite` and `HiveDDLCommandSuite` Author: Andrew Or <andrew@databricks.com> Closes #12220 from andrewor14/alter-partition-ddl.
* [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and ↵gatorsmile2016-04-102-3/+1
| | | | | | | | | | | | | | Drop Table #### What changes were proposed in this pull request? This PR is to address the comment: https://github.com/apache/spark/pull/12146#discussion-diff-59092238. It removes the function `isViewSupported` from `SessionCatalog`. After the removal, we still can capture the user errors if users try to drop a table using `DROP VIEW`. #### How was this patch tested? Modified the existing test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #12284 from gatorsmile/followupDropTable.