| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This reverts commit adfed7086f10fa8db4eeac7996c84cf98f625e9a.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
from a projection(Backport to Spark-1.2)
This is a follow up of #3796 , which can not be merged back to Spark-1.2. Manually merge it.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #4013 from chenghao-intel/spark_4959_backport and squashes the following commits:
1f6c93d [Cheng Hao] backport to Spark-1.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pull only fixes the parsing error and changes API to use tableIdentifier. Joining different catalog datasource related change is not done in this pull.
Author: Alex Liu <alex_liu68@yahoo.com>
Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following commits:
343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review
29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests
6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error
3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support db/catalog ...
(cherry picked from commit 4b39fd1e63188821fc84a13f7ccb6e94277f4be7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Conflicts:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix bug when query like:
```
test("save join to table") {
val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString))
sql("CREATE TABLE test1 (key INT, value STRING)")
testData.insertInto("test1")
sql("CREATE TABLE test2 (key INT, value STRING)")
testData.insertInto("test2")
testData.insertInto("test2")
sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test")
checkAnswer(
table("test"),
sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq)
}
```
Author: Cheng Hao <hao.cheng@intel.com>
Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits:
e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS
e004895 [Cheng Hao] fix bug
(cherry picked from commit 0abbff286220bbcbbf28fbd80b8c5bf59ff37ce2)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 2b72c569a674cccf79ebbe8d067b8dbaaf78007f.
|
|
|
|
| |
This reverts commit bc05df8a23ba7ad485f6844f28f96551b13ba461.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 1056e9ec13203d0c51564265e94d77a054498fdb.
|
|
|
|
| |
This reverts commit 00316cc87983b844f6603f351a8f0b84fe1f6035.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal.
Author: Aaron Davidson <aaron@databricks.com>
Closes #3593 from aarondav/seq-opt and squashes the following commits:
962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop
(cherry picked from commit c6c7165e7ecf1690027d6bd4e0620012cd0d2310)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should use `~` instead of `-` for bitwise NOT.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #3528 from adrian-wang/symbol and squashes the following commits:
affd4ad [Daoyuan Wang] fix code gen test case
56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type
f55fbae [Daoyuan Wang] wrong symbol for bitwise not
(cherry picked from commit 1f5ddf17e831ad9717f0f4b60a727a3381fad4f9)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SELECT max(1/0) FROM src
would return a very large number, which is obviously not right.
For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0.
I think it is better to keep our behavior with newer Hive version.
This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #3443 from adrian-wang/div and squashes the following commits:
2e98677 [Daoyuan Wang] fix code gen for divide 0
85c28ba [Daoyuan Wang] temp
36236a5 [Daoyuan Wang] add test cases
6f5716f [Daoyuan Wang] fix comments
cee92bd [Daoyuan Wang] avoid evaluation 2 times
22ecd9a [Daoyuan Wang] fix style
cf28c58 [Daoyuan Wang] divide fix
2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type
(cherry picked from commit f6df609dcc4f4a18c0f1c74b1ae0800cf09fa7ae)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark SQL has embeded sqrt and abs but DSL doesn't support those functions.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits:
07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite
8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL
(cherry picked from commit e75e04f980281389b881df76f59ba1adc6338629)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
like count(distinct c1,c2..) in Spark SQL
Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL
Author: ravipesala <ravindra.pesala@huawei.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #3511 from ravipesala/countdistinct and squashes the following commits:
cc4dbb1 [ravipesala] style
070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL
(cherry picked from commit 6a9ff19dc06745144d5b311d4f87073c81d53a8f)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #3208 from viirya/more_numericLit and squashes the following commits:
e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal.
1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer.
cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast.
91fe489 [Liang-Chi Hsieh] add Byte and Short.
1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.
(cherry picked from commit b57365a1ec89e31470f424ff37d5ebc7c90a39d8)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 39c7d1c1f9a7785285cf4c20dfbffd96f72d5634.
|
|
|
|
| |
This reverts commit fc7bff00ac731d2632213a98cd92dc5e84ce7dcd.
|
| |
|
| |
|
|
|
|
| |
This reverts commit cc2c05e4ee81d2f34873a2ebb9a5272867cb65c2.
|
|
|
|
| |
This reverts commit 380eba5f49eca1dbd4084e6c84e19866fffd4efa.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 5247dd859b95a440baa562b9827bdeb26aa6530e.
|
|
|
|
| |
This reverts commit 79df6b43ae762263a8120f423ddb4a0811dd4b6f.
|
| |
|
| |
|
|
|
|
| |
This reverts commit db7f4a898af22a02b36428507f8ef2b429d78dc1.
|
|
|
|
| |
This reverts commit d7b1ecb25676d228deb6fe05efdb4e2ab9c3e30b.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 38c1fbd9694430cefd962c90bc36b0d108c6124b.
|
|
|
|
| |
This reverts commit d7ac6013483e83caff8ea54c228f37aeca159db8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we use ORDER BY clause, at first, attributes referenced by projection are resolved (1).
And then, attributes referenced at ORDER BY clause are resolved (2).
But when resolving attributes referenced at ORDER BY clause, the resolution result generated in (1) is discarded so for example, following query fails.
SELECT c1 + c2 FROM mytable ORDER BY c1;
The query above fails because when resolving the attribute reference 'c1', the resolution result of 'c2' is discarded.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #3363 from sarutak/SPARK-4487 and squashes the following commits:
fd314f3 [Kousuke Saruta] Fixed attribute resolution logic in Analyzer
6e60c20 [Kousuke Saruta] Fixed conflicts
cb5b7e9 [Kousuke Saruta] Added test case for SPARK-4487
282d529 [Kousuke Saruta] Fixed attributes reference resolution error
b6123e6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into concat-feature
317b7fb [Kousuke Saruta] WIP
(cherry picked from commit dd1c9cb36cde8202cede8014b5641ae8a0197812)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is just a quick fix for 1.2. SPARK-4523 describes a more complete solution.
Author: Michael Armbrust <michael@databricks.com>
Closes #3392 from marmbrus/parquetMetadata and squashes the following commits:
bcc6626 [Michael Armbrust] Parse schema with missing metadata.
(cherry picked from commit 90a6a46bd11030672597f015dd443d954107123a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Executing sum distinct for empty table throws `java.lang.UnsupportedOperationException: empty.reduceLeft`.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #3184 from ueshin/issues/SPARK-4318 and squashes the following commits:
8168c42 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4318
66fdb0a [Takuya UESHIN] Re-refine aggregate functions.
6186eb4 [Takuya UESHIN] Fix Sum of GeneratedAggregate.
d2975f6 [Takuya UESHIN] Refine Sum and Average of GeneratedAggregate.
1bba675 [Takuya UESHIN] Refine Sum, SumDistinct and Average functions.
917e533 [Takuya UESHIN] Use aggregate instead of groupBy().
1a5f874 [Takuya UESHIN] Add tests to be executed as non-partial aggregation.
a5a57d2 [Takuya UESHIN] Fix empty Average.
22799dc [Takuya UESHIN] Fix empty Sum and SumDistinct.
65b7dd2 [Takuya UESHIN] Fix empty sum distinct.
(cherry picked from commit 2c2e7a44db2ebe44121226f3eac924a0668b991a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The relational operator '<=>' is not working in Spark SQL. Same works in Spark HiveQL
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #3387 from ravipesala/<=> and squashes the following commits:
7198e90 [ravipesala] Supporting relational operator '<=>' in Spark SQL
(cherry picked from commit 98e9419784a9ad5096cfd563fa9a433786a90bd4)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While reviewing PR #3083 and #3161, I noticed that Parquet record filter generation code can be simplified significantly according to the clue stated in [SPARK-4453](https://issues.apache.org/jira/browse/SPARK-4213). This PR addresses both SPARK-4453 and SPARK-4213 with this simplification.
While generating `ParquetTableScan` operator, we need to remove all Catalyst predicates that have already been pushed down to Parquet. Originally, we first generate the record filter, and then call `findExpression` to traverse the generated filter to find out all pushed down predicates [[1](https://github.com/apache/spark/blob/64c6b9bad559c21f25cd9fbe37c8813cdab939f2/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L213-L228)]. In this way, we have to introduce the `CatalystFilter` class hierarchy to bind the Catalyst predicates together with their generated Parquet filter, and complicate the code base a lot.
The basic idea of this PR is that, we don't need `findExpression` after filter generation, because we already know a predicate can be pushed down if we can successfully generate its corresponding Parquet filter. SPARK-4213 is fixed by returning `None` for any unsupported predicate type.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3317)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #3317 from liancheng/simplify-parquet-filters and squashes the following commits:
d6a9499 [Cheng Lian] Fixes import styling issue
43760e8 [Cheng Lian] Simplifies Parquet filter generation logic
(cherry picked from commit 36b0956a3eadc7343ed0d25c79a6ce0496eaaccd)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Cheng Hao <hao.cheng@intel.com>
Closes #3217 from chenghao-intel/mutablerow and squashes the following commits:
e8a10bd [Cheng Hao] revert the change of Row object
4681aea [Cheng Hao] Add toMutableRow method in object Row
a751838 [Cheng Hao] Construct the MutableRow from an existed row
(cherry picked from commit 69e858cc7748b6babadd0cbe20e65f3982161cbf)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`Cast` from `NaN` or `Infinity` of `Double` or `Float` to `TimestampType` throws `NumberFormatException`.
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #3283 from ueshin/issues/SPARK-4425 and squashes the following commits:
14def0c [Takuya UESHIN] Fix Cast to be able to handle NaN or Infinity to TimestampType.
(cherry picked from commit 566c791931645bfaaaf57ee5a15b9ffad534f81e)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DecimalType.
This is follow-up of [SPARK-4390](https://issues.apache.org/jira/browse/SPARK-4390) (#3256).
Author: Takuya UESHIN <ueshin@happy-camper.st>
Closes #3278 from ueshin/issues/SPARK-4420 and squashes the following commits:
7fea558 [Takuya UESHIN] Add some tests.
cb2301a [Takuya UESHIN] Fix tests.
133bad5 [Takuya UESHIN] Change nullability of Cast from DoubleType/FloatType to DecimalType.
(cherry picked from commit 3a81a1c9e0963173534d96850f3c0b7a16350838)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
| |
|
| |
|