aboutsummaryrefslogtreecommitdiff
path: root/sql
Commit message (Collapse)AuthorAgeFilesLines
* [SQL] Whitelist more Hive tests.Michael Armbrust2014-07-15105-0/+163
| | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1396 from marmbrus/moreTests and squashes the following commits: 6660b60 [Michael Armbrust] Blacklist a test that requires DFS command. 8b6001c [Michael Armbrust] Add golden files. ccd8f97 [Michael Armbrust] Whitelist more tests. (cherry picked from commit bcd0c30c7eea4c50301cb732c733fdf4d4142060) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2483][SQL] Fix parsing of repeated, nested data access.Michael Armbrust2014-07-152-6/+9
| | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1411 from marmbrus/nestedRepeated and squashes the following commits: 044fa09 [Michael Armbrust] Fix parsing of repeated, nested data access. (cherry picked from commit 0f98ef1a2c9ecf328f6c5918808fa5ca486e8afd) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2485][SQL] Lock usage of hive client.Michael Armbrust2014-07-151-2/+3
| | | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1412 from marmbrus/lockHiveClient and squashes the following commits: 4bc9d5a [Michael Armbrust] protected[hive] 22e9177 [Michael Armbrust] Add comments. 7aa8554 [Michael Armbrust] Don't lock on hive's object. a6edc5f [Michael Armbrust] Lock usage of hive client. (cherry picked from commit c7c7ac83392b10abb011e6aead1bf92e7c73695e) Signed-off-by: Aaron Davidson <aaron@databricks.com>
* [SPARK-2443][SQL] Fix slow read from partitioned tablesZongheng Yang2014-07-141-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix obtains a comparable performance boost as [PR #1390](https://github.com/apache/spark/pull/1390) by moving an array update and deserializer initialization out of a potentially very long loop. Suggested by yhuai. The below results are updated for this fix. ## Benchmarks Generated a local text file with 10M rows of simple key-value pairs. The data is loaded as a table through Hive. Results are obtained on my local machine using hive/console. Without the fix: Type | Non-partitioned | Partitioned (1 part) ------------ | ------------ | ------------- First run | 9.52s end-to-end (1.64s Spark job) | 36.6s (28.3s) Stablized runs | 1.21s (1.18s) | 27.6s (27.5s) With this fix: Type | Non-partitioned | Partitioned (1 part) ------------ | ------------ | ------------- First run | 9.57s (1.46s) | 11.0s (1.69s) Stablized runs | 1.13s (1.10s) | 1.23s (1.19s) Author: Zongheng Yang <zongheng.y@gmail.com> Closes #1408 from concretevitamin/slow-read-2 and squashes the following commits: d86e437 [Zongheng Yang] Move update & initialization out of potentially long loop. (cherry picked from commit d60b09bb60cff106fa0acddebf35714503b20f03) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [maven-release-plugin] prepare for next development iterationUbuntu2014-07-143-3/+3
|
* [maven-release-plugin] prepare release v1.0.1-rc3Ubuntu2014-07-143-3/+3
|
* [SPARK-2405][SQL] Reusue same byte buffers when creating new instance of ↵Michael Armbrust2014-07-122-12/+25
| | | | | | | | | | | | | | | | | InMemoryRelation Reuse byte buffers when creating unique attributes for multiple instances of an InMemoryRelation in a single query plan. Author: Michael Armbrust <michael@databricks.com> Closes #1332 from marmbrus/doubleCache and squashes the following commits: 4a19609 [Michael Armbrust] Clean up concurrency story by calculating buffersn the constructor. b39c931 [Michael Armbrust] Allocations are kind of a side effect. f67eff7 [Michael Armbrust] Reusue same byte buffers when creating new instance of InMemoryRelation (cherry picked from commit 1a7d7cc85fb24de21f1cde67d04467171b82e845) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2441][SQL] Add more efficient distinct operator.Michael Armbrust2014-07-122-3/+34
| | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1366 from marmbrus/partialDistinct and squashes the following commits: 12a31ab [Michael Armbrust] Add more efficient distinct operator. (cherry picked from commit 7e26b57615f6c1d3f9058f9c19c05ec91f017f4c) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2415] [SQL] RowWriteSupport should handle empty ArrayType correctly.Takuya UESHIN2014-07-103-16/+16
| | | | | | | | | | | | | | | `RowWriteSupport` doesn't write empty `ArrayType` value, so the read value becomes `null`. It should write empty `ArrayType` value as it is. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1339 from ueshin/issues/SPARK-2415 and squashes the following commits: 32afc87 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2415 2f05196 [Takuya UESHIN] Fix RowWriteSupport to handle empty ArrayType correctly. (cherry picked from commit f5abd271292f5c98eb8b1974c1df31d08ed388dd) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2431][SQL] Refine StringComparison and related codes.Takuya UESHIN2014-07-102-15/+16
| | | | | | | | | | | | | | | | | | Refine `StringComparison` and related codes as follows: - `StringComparison` could be similar to `StringRegexExpression` or `CaseConversionExpression`. - Nullability of `StringRegexExpression` could depend on children's nullabilities. - Add a case that the like condition includes no wildcard to `LikeSimplification`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1357 from ueshin/issues/SPARK-2431 and squashes the following commits: 77766f5 [Takuya UESHIN] Add a case that the like condition includes no wildcard to LikeSimplification. b9da9d2 [Takuya UESHIN] Fix nullability of StringRegexExpression. 680bb72 [Takuya UESHIN] Refine StringComparison. (cherry picked from commit f62c42728990266d5d5099abe241f699189ba025) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2409] Make SQLConf thread safe.Reynold Xin2014-07-081-5/+5
| | | | | | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #1334 from rxin/sqlConfThreadSafetuy and squashes the following commits: c1e0a5a [Reynold Xin] Fixed the duplicate comment. 7614372 [Reynold Xin] [SPARK-2409] Make SQLConf thread safe. (cherry picked from commit 32516f866a32d51bfaa04685ae77ba216b4202d9) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2395][SQL] Optimize common LIKE patterns.Michael Armbrust2014-07-082-0/+74
| | | | | | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1325 from marmbrus/slowLike and squashes the following commits: 023c3eb [Michael Armbrust] add comment. 8b421c2 [Michael Armbrust] Handle the case where the final % is actually escaped. d34d37e [Michael Armbrust] add periods. 3bbf35f [Michael Armbrust] Roll back changes to SparkBuild 53894b1 [Michael Armbrust] Fix grammar. 4094462 [Michael Armbrust] Fix grammar. 6d3d0a0 [Michael Armbrust] Optimize common LIKE patterns. (cherry picked from commit cc3e0a14daf756ff5c2d4e7916438e175046e5bb) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2391][SQL] Custom take() for LIMIT queries.Michael Armbrust2014-07-081-4/+47
| | | | | | | | | | | | | | Using Spark's take can result in an entire in-memory partition to be shipped in order to retrieve a single row. Author: Michael Armbrust <michael@databricks.com> Closes #1318 from marmbrus/takeLimit and squashes the following commits: 77289a5 [Michael Armbrust] Update scala doc 32f0674 [Michael Armbrust] Custom take implementation for LIMIT queries. (cherry picked from commit 5a4063645dd7bb4cd8bda890785235729804ab09) Signed-off-by: Reynold Xin <rxin@apache.org>
* Resolve sbt warnings during build Ⅱwitgo2014-07-087-37/+37
| | | | | | | | | | | | Author: witgo <witgo@qq.com> Closes #1153 from witgo/expectResult and squashes the following commits: 97541d8 [witgo] merge master ead26e7 [witgo] Resolve sbt warnings during build (cherry picked from commit 3cd5029be709307415f911236472a685e406e763) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2376][SQL] Selecting list values inside nested JSON objects raises ↵Yin Huai2014-07-071-15/+30
| | | | | | | | | | | | | | | | java.lang.IllegalArgumentException JIRA: https://issues.apache.org/jira/browse/SPARK-2376 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1320 from yhuai/SPARK-2376 and squashes the following commits: 0107417 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2376 480803d [Yin Huai] Correctly handling JSON arrays in PySpark. (cherry picked from commit 4352a2fdaa64efee7158eabef65703460ff284ec) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2375][SQL] JSON schema inference may not resolve type conflicts ↵Yin Huai2014-07-073-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | correctly for a field inside an array of structs For example, for ``` {"array": [{"field":214748364700}, {"field":1}]} ``` the type of field is resolved as IntType. While, for ``` {"array": [{"field":1}, {"field":214748364700}]} ``` the type of field is resolved as LongType. JIRA: https://issues.apache.org/jira/browse/SPARK-2375 Author: Yin Huai <huaiyin.thu@gmail.com> Closes #1308 from yhuai/SPARK-2375 and squashes the following commits: 3e2e312 [Yin Huai] Update unit test. 1b2ff9f [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2375 10794eb [Yin Huai] Correctly resolve the type of a field inside an array of structs. (cherry picked from commit f0496ee10847db921a028a34f70385f9b740b3f3) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2386] [SQL] RowWriteSupport should use the exact types to cast.Takuya UESHIN2014-07-072-3/+41
| | | | | | | | | | | | | | | When execute `saveAsParquetFile` with non-primitive type, `RowWriteSupport` uses wrong type `Int` for `ByteType` and `ShortType`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1315 from ueshin/issues/SPARK-2386 and squashes the following commits: 20d89ec [Takuya UESHIN] Use None instead of null. bd88741 [Takuya UESHIN] Add a test. 323d1d2 [Takuya UESHIN] Modify RowWriteSupport to use the exact types to cast. (cherry picked from commit 4deeed17c4847f212a4fa1a8685cfe8a12179263) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2339][SQL] SQL parser in sql-core is case sensitive, but a table ↵Yin Huai2014-07-076-30/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alias is converted to lower case when we create Subquery Reported by http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-throws-exception-td8599.html After we get the table from the catalog, because the table has an alias, we will temporarily insert a Subquery. Then, we convert the table alias to lower case no matter if the parser is case sensitive or not. To see the issue ... ``` val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD case class Person(name: String, age: Int) val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)) people.registerAsTable("people") sqlContext.sql("select PEOPLE.name from people PEOPLE") ``` The plan is ... ``` == Query Plan == Project ['PEOPLE.name] ExistingRdd [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:176 ``` You can find that `PEOPLE.name` is not resolved. This PR introduces three changes. 1. If a table has an alias, the catalog will not lowercase the alias. If a lowercase alias is needed, the analyzer will do the work. 2. A catalog has a new val caseSensitive that indicates if this catalog is case sensitive or not. For example, a SimpleCatalog is case sensitive, but 3. Corresponding unit tests. With this PR, case sensitivity of database names and table names is handled by the catalog. Case sensitivity of other identifiers are handled by the analyzer. JIRA: https://issues.apache.org/jira/browse/SPARK-2339 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1317 from yhuai/SPARK-2339 and squashes the following commits: 12d8006 [Yin Huai] Handling case sensitivity correctly. This patch introduces three changes. 1. If a table has an alias, the catalog will not lowercase the alias. If a lowercase alias is needed, the analyzer will do the work. 2. A catalog has a new val caseSensitive that indicates if this catalog is case sensitive or not. For example, a SimpleCatalog is case sensitive, but 3. Corresponding unit tests. With this patch, case sensitivity of database names and table names is handled by the catalog. Case sensitivity of other identifiers is handled by the analyzer. (cherry picked from commit c0b4cf097de50eb2c4b0f0e67da53ee92efc1f77) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2327] [SQL] Fix nullabilities of Join/Generate/Aggregate.Takuya UESHIN2014-07-057-21/+60
| | | | | | | | | | | | | | | | | | | | | | Fix nullabilities of `Join`/`Generate`/`Aggregate` because: - Output attributes of opposite side of `OuterJoin` should be nullable. - Output attributes of generater side of `Generate` should be nullable if `join` is `true` and `outer` is `true`. - `AttributeReference` of `computedAggregates` of `Aggregate` should be the same as `aggregateExpression`'s. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1266 from ueshin/issues/SPARK-2327 and squashes the following commits: 3ace83a [Takuya UESHIN] Add withNullability to Attribute and use it to change nullabilities. df1ae53 [Takuya UESHIN] Modify nullabilize to leave attribute if not resolved. 799ce56 [Takuya UESHIN] Add nullabilization to Generate of SparkPlan. a0fc9bc [Takuya UESHIN] Fix scalastyle errors. 0e31e37 [Takuya UESHIN] Fix Aggregate resultAttribute nullabilities. 09532ec [Takuya UESHIN] Fix Generate output nullabilities. f20f196 [Takuya UESHIN] Fix Join output nullabilities. (cherry picked from commit 9d5ecf8205b924dc8a3c13fed68beb78cc5c7553) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2366] [SQL] Add column pruning for the right side of LeftSemi join.Takuya UESHIN2014-07-051-8/+20
| | | | | | | | | | | | | | | The right side of `LeftSemi` join needs columns only used in join condition. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1301 from ueshin/issues/SPARK-2366 and squashes the following commits: 7677a39 [Takuya UESHIN] Update comments. 786d3a0 [Takuya UESHIN] Rename method name. e0957b1 [Takuya UESHIN] Add column pruning for the right side of LeftSemi join. (cherry picked from commit 3da8df939ec63064692ba64d9188aeea908b305c) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2370][SQL] Decrease metadata retrieved for partitioned hive queries.Michael Armbrust2014-07-041-1/+1
| | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1305 from marmbrus/usePrunerPartitions and squashes the following commits: 744aa20 [Michael Armbrust] Use getAllPartitionsForPruner instead of getPartitions, which avoids retrieving auth data (cherry picked from commit 9d006c97371ddf357e0b821d5c6d1535d9b6fe41) Signed-off-by: Reynold Xin <rxin@apache.org>
* [maven-release-plugin] prepare for next development iterationUbuntu2014-07-043-3/+3
|
* [maven-release-plugin] prepare release v1.0.1-rc2v1.0.1Ubuntu2014-07-043-3/+3
|
* [SPARK-2059][SQL] Add analysis checksReynold Xin2014-07-042-0/+24
| | | | | | | | | | | | | | | | This replaces #1263 with a test case. Author: Reynold Xin <rxin@apache.org> Author: Michael Armbrust <michael@databricks.com> Closes #1265 from rxin/sql-analysis-error and squashes the following commits: a639e01 [Reynold Xin] Added a test case for unresolved attribute analysis. 7371e1b [Reynold Xin] Merge pull request #1263 from marmbrus/analysisChecks 448c088 [Michael Armbrust] Add analysis checks (cherry picked from commit b3e768e154bd7175db44c3ffc3d8f783f15ab776) Signed-off-by: Reynold Xin <rxin@apache.org>
* Update SQLConf.scalabaishuo(白硕)2014-07-041-6/+3
| | | | | | | | | | | | | | | | | | | | | use concurrent.ConcurrentHashMap instead of util.Collections.synchronizedMap Author: baishuo(白硕) <vc_java@hotmail.com> Closes #1272 from baishuo/master and squashes the following commits: 51ec55d [baishuo(白硕)] Update SQLConf.scala 63da043 [baishuo(白硕)] Update SQLConf.scala 36b6dbd [baishuo(白硕)] Update SQLConf.scala 864faa0 [baishuo(白硕)] Update SQLConf.scala 593096b [baishuo(白硕)] Update SQLConf.scala 7304d9b [baishuo(白硕)] Update SQLConf.scala 843581c [baishuo(白硕)] Update SQLConf.scala 1d3e4a2 [baishuo(白硕)] Update SQLConf.scala 0740f28 [baishuo(白硕)] Update SQLConf.scala (cherry picked from commit 0bbe61223eda3f33bbf8992d2a8f0d47813f4873) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2059][SQL] Don't throw TreeNodeException in `execution.ExplainCommand`Cheng Lian2014-07-031-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fix for the problem revealed by PR #1265. Currently `HiveComparisonSuite` ignores output of `ExplainCommand` since Catalyst query plan is quite different from Hive query plan. But exceptions throw from `CheckResolution` still breaks test cases. This PR catches any `TreeNodeException` and reports it as part of the query explanation. After merging this PR, PR #1265 can also be merged safely. For a normal query: ``` scala> hql("explain select key from src").foreach(println) ... [Physical execution plan:] [HiveTableScan [key#9], (MetastoreRelation default, src, None), None] ``` For a wrong query with unresolved attribute(s): ``` scala> hql("explain select kay from src").foreach(println) ... [Error occurred during query planning: ] [Unresolved attributes: 'kay, tree:] [Project ['kay]] [ LowerCaseSchema ] [ MetastoreRelation default, src, None] ``` Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1294 from liancheng/safe-explain and squashes the following commits: 4318911 [Cheng Lian] Don't throw TreeNodeException in `execution.ExplainCommand` (cherry picked from commit 544880457de556d1ad52e8cb7e1eca19da95f517) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2342] Evaluation helper's output type doesn't conform to input ty...Yijie Shen2014-07-031-1/+1
| | | | | | | | | | | | | The function cast doesn't conform to the intention of "Those expressions are supposed to be in the same data type, and also the return type." comment Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #1283 from yijieshen/master and squashes the following commits: c7aaa4b [Yijie Shen] [SPARK-2342] Evaluation helper's output type doesn't conform to input type (cherry picked from commit a9b52e5623f7fc77fca96b095f9eeaef76e35d54) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2287] [SQL] Make ScalaReflection be able to handle Generic case classes.Takuya UESHIN2014-07-022-2/+25
| | | | | | | | | | | | | | | Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1226 from ueshin/issues/SPARK-2287 and squashes the following commits: 32ef7c3 [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`. 541dc8d [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287 fac5fae [Takuya UESHIN] Remove unnecessary method receiver. d306e60 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2287 7de5706 [Takuya UESHIN] Make ScalaReflection be able to handle Generic case classes. (cherry picked from commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2328] [SQL] Add execution of `SHOW TABLES` before `TestHive.reset()`.Takuya UESHIN2014-07-021-0/+3
| | | | | | | | | | | | | | `PruningSuite` is executed first of Hive tests unfortunately, `TestHive.reset()` breaks the test environment. To prevent this, we must run a query before calling reset the first time. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1268 from ueshin/issues/SPARK-2328 and squashes the following commits: 043ceac [Takuya UESHIN] Add execution of `SHOW TABLES` before `TestHive.reset()`. (cherry picked from commit 1e2c26c83dd2e807cf0031ceca8b338a1a57cac6) Signed-off-by: Michael Armbrust <michael@databricks.com>
* SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVGXimo Guanter Gonzalbez2014-07-023-8/+44
| | | | | | | | | | | | | | | **Description** This patch enables using the `.select()` function in SchemaRDD with functions such as `Sum`, `Count` and other. **Testing** Unit tests added. Author: Ximo Guanter Gonzalbez <ximo@tid.es> Closes #1211 from edrevo/add-expression-support-in-select and squashes the following commits: fe4a1e1 [Ximo Guanter Gonzalbez] Extend SQL DSL to functions e1d344a [Ximo Guanter Gonzalbez] SPARK-2186: Spark SQL DSL support for simple aggregations such as SUM and AVG (cherry picked from commit 5c6ec94da1bacd8e65a43acb92b6721493484e7b) Signed-off-by: Michael Armbrust <michael@databricks.com>
* update the comments in SqlParserCodingCat2014-07-011-1/+0
| | | | | | | | | | | | | SqlParser has been case-insensitive after https://github.com/apache/spark/commit/dab5439a083b5f771d5d5b462d0d517fa8e9aaf2 was merged Author: CodingCat <zhunansjtu@gmail.com> Closes #1275 from CodingCat/master and squashes the following commits: 17931cd [CodingCat] update the comments in SqlParser (cherry picked from commit 6596392da0fc0fee89e22adfca239a3477dfcbab) Signed-off-by: Reynold Xin <rxin@apache.org>
* Revert "[maven-release-plugin] prepare release v1.0.1-rc1"Patrick Wendell2014-06-273-3/+3
| | | | This reverts commit 7feeda3d729f9397aa15ee8750c01ef5aa601962.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-06-273-3/+3
| | | | This reverts commit ea1a455a755f83f46fc8bf242410917d93d0c52c.
* [maven-release-plugin] prepare for next development iterationUbuntu2014-06-263-3/+3
|
* [maven-release-plugin] prepare release v1.0.1-rc1Ubuntu2014-06-263-3/+3
|
* [SPARK-2295] [SQL] Make JavaBeans nullability stricter.Takuya UESHIN2014-06-261-19/+18
| | | | | | | | | | | Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1235 from ueshin/issues/SPARK-2295 and squashes the following commits: 201c508 [Takuya UESHIN] Make JavaBeans nullability stricter. (cherry picked from commit 32a1ad75313472b1b098f7ec99335686d3fe4fc3) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2254] [SQL] ScalaRefection should mark primitive types as non-nullable.Takuya UESHIN2014-06-252-31/+165
| | | | | | | | | | | Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1193 from ueshin/issues/SPARK-2254 and squashes the following commits: cfd6088 [Takuya UESHIN] Modify ScalaRefection.schemaFor method to return nullability of Scala Type. (cherry picked from commit e4899a253728bfa7c78709a37a4837f74b72bd61) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2283][SQL] Reset test environment before running PruningSuiteCheng Lian2014-06-251-0/+5
| | | | | | | | | | | | | | | JIRA issue: [SPARK-2283](https://issues.apache.org/jira/browse/SPARK-2283) If `PruningSuite` is run right after `HiveCompatibilitySuite`, the first test case fails because `srcpart` table is cached in-memory by `HiveCompatibilitySuite`, but column pruning is not implemented for `InMemoryColumnarTableScan` operator yet. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1221 from liancheng/spark-2283 and squashes the following commits: dc0b663 [Cheng Lian] SPARK-2283: reset test environment before running PruningSuite (cherry picked from commit 7f196b009d26d4aed403b3c694f8b603601718e3) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [BUGFIX][SQL] Should match java.math.BigDecimal when wnrapping Hive outputCheng Lian2014-06-251-4/+4
| | | | | | | | | | | | | The `BigDecimal` branch in `unwrap` matches to `scala.math.BigDecimal` rather than `java.math.BigDecimal`. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1199 from liancheng/javaBigDecimal and squashes the following commits: e9bb481 [Cheng Lian] Should match java.math.BigDecimal when wnrapping Hive output (cherry picked from commit 22036aeb1b2cac7f48cd60afea925b42a5318631) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2263][SQL] Support inserting MAP<K, V> to Hive tablesCheng Lian2014-06-253-6/+20
| | | | | | | | | | | | | | | | JIRA issue: [SPARK-2263](https://issues.apache.org/jira/browse/SPARK-2263) Map objects were not converted to Hive types before inserting into Hive tables. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1205 from liancheng/spark-2263 and squashes the following commits: c7a4373 [Cheng Lian] Addressed @concretevitamin's comment 784940b [Cheng Lian] SARPK-2263: support inserting MAP<K, V> to Hive tables (cherry picked from commit 8fade8973e5fc97f781de5344beb66b90bd6e524) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2264][SQL] Fix failing CachedTableSuiteMichael Armbrust2014-06-243-24/+25
| | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1201 from marmbrus/fixCacheTests and squashes the following commits: 9d87ed1 [Michael Armbrust] Use analyzer (which runs to fixed point) instead of manually removing analysis operators. Conflicts: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
* [SQL]Add base row updating methods for JoinedRowCheng Hao2014-06-241-0/+17
| | | | | | | | | | | | | This will be helpful in join operators. Author: Cheng Hao <hao.cheng@intel.com> Closes #1187 from chenghao-intel/joinedRow and squashes the following commits: 87c19e3 [Cheng Hao] Add base row set methods for JoinedRow (cherry picked from commit 133495d82672c3f34d40a6298cc80c31f91faf5c) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2227] Support dfs command in SQL.Reynold Xin2014-06-231-8/+6
| | | | | | | | | | | | | | | Note that nothing gets printed to the console because we don't properly maintain session state right now. I will have a followup PR that fixes it. Author: Reynold Xin <rxin@apache.org> Closes #1167 from rxin/commands and squashes the following commits: 56f04f8 [Reynold Xin] [SPARK-2227] Support dfs command in SQL. (cherry picked from commit 51c8168377a89d20d0b2d7b9a28af58593a0fe0c) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-1669][SQL] Made cacheTable idempotentCheng Lian2014-06-232-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | JIRA issue: [SPARK-1669](https://issues.apache.org/jira/browse/SPARK-1669) Caching the same table multiple times should end up with only 1 in-memory columnar representation of this table. Before: ``` scala> loadTestTable("src") ... scala> cacheTable("src") ... scala> cacheTable("src") ... scala> table("src") ... == Query Plan == InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None)))) ``` After: ``` scala> loadTestTable("src") ... scala> cacheTable("src") ... scala> cacheTable("src") ... scala> table("src") ... == Query Plan == InMemoryColumnarTableScan [key#2,value#3], (InMemoryRelation [key#2,value#3], false, (HiveTableScan [key#2,value#3], (MetastoreRelation default, src, None), None)) ``` Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1183 from liancheng/spark-1669 and squashes the following commits: 68f8a20 [Cheng Lian] Removed an unused import 51bae90 [Cheng Lian] Made cacheTable idempotent (cherry picked from commit a4bc442ca2c35444de8a33376b6f27c6c2a9003d) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL] Break hiveOperators.scala into multiple files.Reynold Xin2014-06-216-529/+610
| | | | | | | | | | | | | The single file was getting very long (500+ loc). Author: Reynold Xin <rxin@apache.org> Closes #1166 from rxin/hiveOperators and squashes the following commits: 5b43068 [Reynold Xin] [SQL] Break hiveOperators.scala into multiple files. (cherry picked from commit ec935abce13b60f353236566da149c0c87bb1002) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL] Pass SQLContext instead of SparkContext into physical operators.Reynold Xin2014-06-207-44/+51
| | | | | | | | | | | | | This makes it easier to use config options in operators. Author: Reynold Xin <rxin@apache.org> Closes #1164 from rxin/sqlcontext and squashes the following commits: 797b2fd [Reynold Xin] Pass SQLContext instead of SparkContext into physical operators. (cherry picked from commit ca5d8b5904dc6dd5b691af506d3a842e508b3673) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SQL] Use hive.SessionState, not the thread local SessionStateAaron Davidson2014-06-201-1/+1
| | | | | | | | | | | | | Note that this is simply mimicing lookupRelation(). I do not have a concrete notion of why this solution is necessarily right-er than SessionState.get, but SessionState.get is returning null, which is bad. Author: Aaron Davidson <aaron@databricks.com> Closes #1148 from aarondav/createtable and squashes the following commits: 37c3e7c [Aaron Davidson] [SQL] Use hive.SessionState, not the thread local SessionState (cherry picked from commit 2044784915554a890ca6f8450d8403495b2ee4f3) Signed-off-by: Reynold Xin <rxin@apache.org>
* Move ScriptTransformation into the appropriate place.Reynold Xin2014-06-201-0/+0
| | | | | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #1162 from rxin/script and squashes the following commits: 2c836b9 [Reynold Xin] Move ScriptTransformation into the appropriate place. (cherry picked from commit d4c7572dba1be49e55ceb38713652e5bcf485be8) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-2225] Turn HAVING without GROUP BY into WHERE.Reynold Xin2014-06-202-23/+11
| | | | | | | | | | | | | @willb Author: Reynold Xin <rxin@apache.org> Closes #1161 from rxin/having-filter and squashes the following commits: fa8359a [Reynold Xin] [SPARK-2225] Turn HAVING without GROUP BY into WHERE. (cherry picked from commit 0ac71d1284cd4f011d5763181cba9ecb49337b66) Signed-off-by: Reynold Xin <rxin@apache.org>
* SPARK-2180: support HAVING clauses in Hive queriesWilliam Benton2014-06-202-6/+53
| | | | | | | | | | | | | | | | | | This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <willb@redhat.com> Closes #1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries. (cherry picked from commit 171ebb3a824a577d69443ec68a3543b27914cf6d) Signed-off-by: Reynold Xin <rxin@apache.org>