| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and
[SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext
Author: Reynold Xin <rxin@databricks.com>
Closes #4242 from rxin/sqlCleanup and squashes the following commits:
e351cb2 [Reynold Xin] Fixed toDataFrame.
6545c42 [Reynold Xin] More changes.
728c017 [Reynold Xin] [SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4235 from rxin/df-tests1 and squashes the following commits:
f341db6 [Reynold Xin] [SPARK-5097][SQL] Test cases for DataFrame expressions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pull request redesigns the existing Spark SQL dsl, which already provides data frame like functionalities.
TODOs:
With the exception of Python support, other tasks can be done in separate, follow-up PRs.
- [ ] Audit of the API
- [ ] Documentation
- [ ] More test cases to cover the new API
- [x] Python support
- [ ] Type alias SchemaRDD
Author: Reynold Xin <rxin@databricks.com>
Author: Davies Liu <davies@databricks.com>
Closes #4173 from rxin/df1 and squashes the following commits:
0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1
23b4427 [Reynold Xin] Mima.
828f70d [Reynold Xin] Merge pull request #7 from davies/df
257b9e6 [Davies Liu] add repartition
6bf2b73 [Davies Liu] fix collect with UDT and tests
e971078 [Reynold Xin] Missing quotes.
b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now.
a728bf2 [Reynold Xin] Example rename.
e8aa3d3 [Reynold Xin] groupby -> groupBy.
9662c9e [Davies Liu] improve DataFrame Python API
4ae51ea [Davies Liu] python API for dataframe
1e5e454 [Reynold Xin] Fixed a bug with symbol conversion.
2ca74db [Reynold Xin] Couple minor fixes.
ea98ea1 [Reynold Xin] Documentation & literal expressions.
2b22684 [Reynold Xin] Got rid of IntelliJ problems.
02bbfbc [Reynold Xin] Tightening imports.
ffbce66 [Reynold Xin] Fixed compilation error.
59b6d8b [Reynold Xin] Style violation.
b85edfb [Reynold Xin] ALS.
8c37f0a [Reynold Xin] Made MLlib and examples compile
6d53134 [Reynold Xin] Hive module.
d35efd5 [Reynold Xin] Fixed compilation error.
ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite.
66d5ef1 [Reynold Xin] SQLContext minor patch.
c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
This is a block issue for the CLI user, it impacts the existed hql scripts from Hive.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #4003 from chenghao-intel/substitution and squashes the following commits:
bb41fd6 [Cheng Hao] revert the removed the implicit conversion
af7c31a [Cheng Hao] add hql variable substitution support
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR removes the deprecated `ParquetQuerySuite`, renamed `ParquetQuerySuite2` to `ParquetQuerySuite`, and refactored changes introduced in #4115 to `ParquetFilterSuite` . It is a follow-up of #3644.
Notice that test cases in the old `ParquetQuerySuite` have already been well covered by other test suites introduced in #3644.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4116)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4116 from liancheng/remove-deprecated-parquet-tests and squashes the following commits:
f73b8f9 [Cheng Lian] Removes deprecated Parquet test suite
|
|
|
|
| |
This reverts commit 812d3679f5f97df7b667cbc3365a49866ebc02d5.
|
|
|
|
|
|
|
|
|
|
|
| |
* The `SqlLexical.allCaseVersions` will cause `StackOverflowException` if the key word is too long, the patch will fix that by normalizing all of the keywords in `SqlLexical`.
* And make a unified SparkSQLParser for sharing the common code.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #3926 from chenghao-intel/long_keyword and squashes the following commits:
686660f [Cheng Hao] Support Long Keyword and Refactor the SQLParsers
|
|
|
|
|
|
|
|
| |
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #4040 from adrian-wang/coalesce and squashes the following commits:
0ac8e8f [Daoyuan Wang] add coalesce() in sql parser
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4115 from rxin/row-seq and squashes the following commits:
e33abd8 [Reynold Xin] Fixed compilation error.
cceb650 [Reynold Xin] Python test fixes, and removal of WrapDynamic.
0334a52 [Reynold Xin] mkString.
9cdeb7d [Reynold Xin] Hive tests.
15681c2 [Reynold Xin] Fix more test cases.
ea9023a [Reynold Xin] Fixed a catalyst test.
c5e2cb5 [Reynold Xin] Minor patch up.
b9cab7c [Reynold Xin] [SPARK-5323][SQL] Remove Row's Seq inheritance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-5287
This PR only add `defaultSizeOf` to data types and make those internal type classes `protected[sql]`. I will use another PR to cleanup the type hierarchy of data types.
Author: Yin Huai <yhuai@databricks.com>
Closes #4081 from yhuai/SPARK-5287 and squashes the following commits:
90cec75 [Yin Huai] Update unit test.
e1c600c [Yin Huai] Make internal classes protected[sql].
7eaba68 [Yin Huai] Add `defaultSize` method to data types.
fd425e0 [Yin Huai] Add all native types to NativeType.defaultSizeOf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up of #4090. The original deeply nested `reduceOption` code is hard to grasp.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4091)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4091 from liancheng/refactor-boolean-simplification and squashes the following commits:
cd8860b [Cheng Lian] Improves `compareConditions` to handle more subtle cases
1bf3258 [Cheng Lian] Avoids converting predicate sets to lists
e833ca4 [Cheng Lian] Refactors deeply nested FP style code
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4117 from rxin/catalyst-test-log4j and squashes the following commits:
8ad610b [Reynold Xin] [SQL][minor] Add a log4j file for catalyst test.
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-5286
Author: Yin Huai <yhuai@databricks.com>
Closes #4076 from yhuai/SPARK-5286 and squashes the following commits:
6b69ed1 [Yin Huai] Catch all exception when we try to uncache a query.
|
|
|
|
|
|
|
|
|
|
|
|
| |
field has a null value
JIRA: https://issues.apache.org/jira/browse/SPARK-5284
Author: Yin Huai <yhuai@databricks.com>
Closes #4077 from yhuai/SPARK-5284 and squashes the following commits:
fceacd6 [Yin Huai] Check if a value is null when the field has a complex type.
|
|
|
|
|
|
|
|
|
|
| |
Author: Jacky Li <jacky.likun@gmail.com>
Closes #4100 from jackylk/patch-9 and squashes the following commits:
b13b9d6 [Jacky Li] Update SQLConf.scala
4d3f83d [Jacky Li] Update SQLConf.scala
fcc8c85 [Jacky Li] [SQL] fix typo in class description
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4097 from rxin/javarename and squashes the following commits:
c5ce96a [Reynold Xin] [SQL][minor] Put DataTypes.java in java dir.
|
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4092 from rxin/bigdecimal and squashes the following commits:
27b08c9 [Reynold Xin] Fixed test.
10cb496 [Reynold Xin] [SPARK-5279][SQL] Use java.math.BigDecimal as the exposed Decimal type.
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4090 from rxin/booleanSimplification and squashes the following commits:
68c8986 [Reynold Xin] [SQL][Minor] Added comments and examples to explain BooleanSimplification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`BooleanSimplification`
Follow up of #3778
/cc rxin
Author: scwf <wangfei1@huawei.com>
Closes #4086 from scwf/commentforspark-4937 and squashes the following commits:
aaf89f6 [scwf] code style issue
2d3406e [scwf] added comment for spark-4937
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #4085 from rxin/row-doc and squashes the following commits:
f77cb27 [Reynold Xin] [SQL][minor] Improved Row documentation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After the following patches, the main (Scala) API is now usable for Java users directly.
https://github.com/apache/spark/pull/4056
https://github.com/apache/spark/pull/4054
https://github.com/apache/spark/pull/4049
https://github.com/apache/spark/pull/4030
https://github.com/apache/spark/pull/3965
https://github.com/apache/spark/pull/3958
Author: Reynold Xin <rxin@databricks.com>
Closes #4065 from rxin/sql-java-api and squashes the following commits:
b1fd860 [Reynold Xin] Fix Mima
6d86578 [Reynold Xin] Ok one more attempt in fixing Python...
e8f1455 [Reynold Xin] Fix Python again...
3e53f91 [Reynold Xin] Fixed Python.
83735da [Reynold Xin] Fix BigDecimal test.
e9f1de3 [Reynold Xin] Use scala BigDecimal.
500d2c4 [Reynold Xin] Fix Decimal.
ba3bfa2 [Reynold Xin] Updated javadoc for RowFactory.
c4ae1c5 [Reynold Xin] [SPARK-5193][SQL] Remove Spark SQL Java-specific API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark sql
Adding optimization to simplify the And/Or condition in spark sql.
There are two kinds of Optimization
1 Numeric condition optimization, such as:
a < 3 && a > 5 ---- False
a < 1 || a > 0 ---- True
a > 3 && a > 5 => a > 5
(a < 2 || b > 5) && a < 2 => a < 2
2 optimizing the some query from a cartesian product into equi-join, such as this sql (one of hive-testbench):
```
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#32'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 7 and l_quantity <= 7 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#35'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 15 and l_quantity <= 15 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#24'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 26 and l_quantity <= 26 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
```
It has a repeated expression in Or, so we can optimize it by ``` (a && b) || (a && c) = a && (b || c)```
Before optimization, this sql hang in my locally test, and the physical plan is:
![image](https://cloud.githubusercontent.com/assets/7018048/5539175/31cf38e8-8af9-11e4-95e3-336f9b3da4a4.png)
After optimization, this sql run successfully in 20+ seconds, and its physical plan is:
![image](https://cloud.githubusercontent.com/assets/7018048/5539176/39a558e0-8af9-11e4-912b-93de94b20075.png)
This PR focus on the second optimization and some simple ones of the first. For complex Numeric condition optimization, I will make a follow up PR.
Author: scwf <wangfei1@huawei.com>
Author: wangfei <wangfei1@huawei.com>
Closes #3778 from scwf/filter1 and squashes the following commits:
58bcbc2 [scwf] minor format fix
9570211 [scwf] conflicts fix
527e6ce [scwf] minor comment improvements
5c6f134 [scwf] remove numeric optimizations and move to BooleanSimplification
546a82b [wangfei] style fix
825fa69 [wangfei] adding more tests
a001e8c [wangfei] revert pom changes
32a595b [scwf] improvement and test fix
e99a26c [wangfei] refactory And/Or optimization to make it more readable and clean
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of SPARK-5193:
1. Removed UDFRegistration as a mixin in SQLContext and made it a field ("udf").
2. For Java UDFs, renamed dataType to returnType.
3. For Scala UDFs, added type tags.
4. Added all Java UDF registration methods to Scala's UDFRegistration.
5. Documentation
Author: Reynold Xin <rxin@databricks.com>
Closes #4056 from rxin/udf-registration and squashes the following commits:
ae9c556 [Reynold Xin] Updated example.
675a3c9 [Reynold Xin] Style fix
47c24ff [Reynold Xin] Python fix.
5f00c45 [Reynold Xin] Restore data type position in java udf and added typetags.
032f006 [Reynold Xin] [SPARK-5193][SQL] Reconcile Java and Scala UDFRegistration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Removed the deprecated LocalHiveContext
2. Made private[sql] fields protected[sql] so they don't show up in javadoc.
3. Added javadoc to refreshTable.
4. Added Experimental tag to analyze command.
Author: Reynold Xin <rxin@databricks.com>
Closes #4054 from rxin/hivecontext-api and squashes the following commits:
25cc00a [Reynold Xin] Add implicit conversion back.
cbca886 [Reynold Xin] [SPARK-5193][SQL] Tighten up HiveContext API
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Removed 2 implicits (logicalPlanToSparkQuery and baseRelationToSchemaRDD)
2. Moved extraStrategies into ExperimentalMethods.
3. Made private methods protected[sql] so they don't show up in javadocs.
4. Removed createParquetFile.
5. Added Java version of applySchema to SQLContext.
Author: Reynold Xin <rxin@databricks.com>
Closes #4049 from rxin/sqlContext-refactor and squashes the following commits:
a326a1a [Reynold Xin] Remove createParquetFile and add applySchema for Java to SQLContext.
ecd6685 [Reynold Xin] Added baseRelationToSchemaRDD back.
4a38c9b [Reynold Xin] [SPARK-5193][SQL] Tighten up SQLContext API
|
|
|
|
|
|
|
|
|
|
| |
Declare SQLConf to be serializable to fix "Task not serializable" exceptions in SparkSQL
Author: Alex Baretta <alexbaretta@gmail.com>
Closes #4031 from alexbaretta/SPARK-5235-SQLConf and squashes the following commits:
c2103f5 [Alex Baretta] [SPARK-5235] Make SQLConf Serializable
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`TaskContext.attemptId` is misleadingly-named, since it currently returns a taskId, which uniquely identifies a particular task attempt within a particular SparkContext, instead of an attempt number, which conveys how many times a task has been attempted.
This patch deprecates `TaskContext.attemptId` and add `TaskContext.taskId` and `TaskContext.attemptNumber` fields. Prior to this change, it was impossible to determine whether a task was being re-attempted (or was a speculative copy), which made it difficult to write unit tests for tasks that fail on early attempts or speculative tasks that complete faster than original tasks.
Earlier versions of the TaskContext docs suggest that `attemptId` behaves like `attemptNumber`, so there's an argument to be made in favor of changing this method's implementation. Since we've decided against making that change in maintenance branches, I think it's simpler to add better-named methods and retain the old behavior for `attemptId`; if `attemptId` behaved differently in different branches, then this would cause confusing build-breaks when backporting regression tests that rely on the new `attemptId` behavior.
Most of this patch is fairly straightforward, but there is a bit of trickiness related to Mesos tasks: since there's no field in MesosTaskInfo to encode the attemptId, I packed it into the `data` field alongside the task binary.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #3849 from JoshRosen/SPARK-4014 and squashes the following commits:
89d03e0 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
5cfff05 [Josh Rosen] Introduce wrapper for serializing Mesos task launch data.
38574d4 [Josh Rosen] attemptId -> taskAttemptId in PairRDDFunctions
a180b88 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
1d43aa6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
eee6a45 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014
0b10526 [Josh Rosen] Use putInt instead of putLong (silly mistake)
8c387ce [Josh Rosen] Use local with maxRetries instead of local-cluster.
cbe4d76 [Josh Rosen] Preserve attemptId behavior and deprecate it:
b2dffa3 [Josh Rosen] Address some of Reynold's minor comments
9d8d4d1 [Josh Rosen] Doc typo
1e7a933 [Josh Rosen] [SPARK-4014] Change TaskContext.attemptId to return attempt number instead of task ID.
fd515a5 [Josh Rosen] Add failing test for SPARK-4014
|
|
|
|
|
|
|
|
| |
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #4000 from adrian-wang/comment and squashes the following commits:
9c24fc4 [Daoyuan Wang] some comments
|
|
|
|
|
|
|
|
|
|
|
| |
jira: https://issues.apache.org/jira/browse/SPARK-5211
Author: Yin Huai <yhuai@databricks.com>
Closes #4026 from yhuai/SPARK-5211 and squashes the following commits:
15ee32b [Yin Huai] Remove extra line.
c6c1651 [Yin Huai] Get back HiveMetastoreTypes.toDataType.
|
|
|
|
|
|
|
|
|
|
|
| |
rxin follow up of #3732
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #4041 from adrian-wang/decimal and squashes the following commits:
aa3d738 [Daoyuan Wang] fix auto refactor
7777a58 [Daoyuan Wang] move sql.types.decimal.Decimal to sql.types.Decimal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly just moving stuff around. This should still be source compatible since we type aliased Row previously in org.apache.spark.sql.Row.
Added the following APIs to Row:
```scala
def getMap[K, V](i: Int): scala.collection.Map[K, V]
def getJavaMap[K, V](i: Int): java.util.Map[K, V]
def getSeq[T](i: Int): Seq[T]
def getList[T](i: Int): java.util.List[T]
def getStruct(i: Int): StructType
```
Author: Reynold Xin <rxin@databricks.com>
Closes #4030 from rxin/sql-row and squashes the following commits:
6c85c29 [Reynold Xin] Fixed style violation by adding a new line to Row.scala.
82b064a [Reynold Xin] [SPARK-5167][SQL] Move Row into sql package and make it usable for Java.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having two versions of the data type APIs (one for Java, one for Scala) requires downstream libraries to also have two versions of the APIs if the library wants to support both Java and Scala. I took a look at the Scala version of the data type APIs - it can actually work out pretty well for Java out of the box.
As part of the PR, I created a sql.types package and moved all type definitions there. I then removed the Java specific data type API along with a lot of the conversion code.
This subsumes https://github.com/apache/spark/pull/3925
Author: Reynold Xin <rxin@databricks.com>
Closes #3958 from rxin/SPARK-5123-datatype-2 and squashes the following commits:
66505cc [Reynold Xin] [SPARK-5123] Expose only one version of the data type APIs (i.e. remove the Java-specific API).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change should be binary and source backward compatible since we didn't change any user facing APIs.
Author: Reynold Xin <rxin@databricks.com>
Closes #3965 from rxin/SPARK-5168-sqlconf and squashes the following commits:
42eec09 [Reynold Xin] Fix default conf value.
0ef86cc [Reynold Xin] Fix constructor ordering.
4d7f910 [Reynold Xin] Properly override config.
ccc8e6a [Reynold Xin] [SPARK-5168] Make SQLConf a field rather than mixin in SQLContext
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With changes in this PR, users can persist metadata of tables created based on the data source API in metastore through DDLs.
Author: Yin Huai <yhuai@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #3960 from yhuai/persistantTablesWithSchema2 and squashes the following commits:
069c235 [Yin Huai] Make exception messages user friendly.
c07cbc6 [Yin Huai] Get the location of test file in a correct way.
4456e98 [Yin Huai] Test data.
5315dfc [Yin Huai] rxin's comments.
7fc4b56 [Yin Huai] Add DDLStrategy and HiveDDLStrategy to plan DDLs based on the data source API.
aeaf4b3 [Yin Huai] Add comments.
06f9b0c [Yin Huai] Revert unnecessary changes.
feb88aa [Yin Huai] Merge remote-tracking branch 'apache/master' into persistantTablesWithSchema2
172db80 [Yin Huai] Fix unit test.
49bf1ac [Yin Huai] Unit tests.
8f8f1a1 [Yin Huai] [SPARK-4574][SQL] Adding support for defining schema in foreign DDL commands. #3431
f47fda1 [Yin Huai] Unit tests.
2b59723 [Michael Armbrust] Set external when creating tables
c00bb1b [Michael Armbrust] Don't use reflection to read options
1ea6e7b [Michael Armbrust] Don't fail when trying to uncache a table that doesn't exist
6edc710 [Michael Armbrust] Add tests.
d7da491 [Michael Armbrust] First draft of persistent tables.
|
|
|
|
|
|
|
|
|
|
| |
Followup to #3870. Props to rahulaggarwalguavus for identifying the issue.
Author: Michael Armbrust <michael@databricks.com>
Closes #3990 from marmbrus/SPARK-5049 and squashes the following commits:
dd03e4e [Michael Armbrust] Fill in the partition values of parquet scans instead of using JoinedRow
|
|
|
|
|
|
|
|
|
|
| |
Disables the Spark web UI in HiveThriftServer2Suite in order to prevent Jenkins test failures due to port contention.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #3998 from JoshRosen/SPARK-5200 and squashes the following commits:
a384416 [Josh Rosen] [SPARK-5200] Disable web UI in Hive Thriftserver tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable from follow multiple brackets:
```
select key from ((select * from testData limit 1) union all (select * from testData limit 1)) x limit 1
```
Author: scwf <wangfei1@huawei.com>
Closes #3853 from scwf/from and squashes the following commits:
14f110a [scwf] enable from follow multiple brackets
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: wangfei <wangfei1@huawei.com>
Closes #3718 from scwf/sparksqlui and squashes the following commits:
e0d6b5d [wangfei] format fix
383b505 [wangfei] fix conflicts
4d2038a [wangfei] using setJobDescription
df79837 [wangfei] fix compile error
92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support ! boolean logic operator like NOT in sql as follows
select * from for_test where !(col1 > col2)
Author: YanTangZhai <hakeemzhai@tencent.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #3555 from YanTangZhai/SPARK-4692 and squashes the following commits:
1a9f605 [YanTangZhai] Update HiveQuerySuite.scala
7c03c68 [YanTangZhai] Merge pull request #23 from apache/master
992046e [YanTangZhai] Update HiveQuerySuite.scala
ea618f4 [YanTangZhai] Update HiveQuerySuite.scala
192411d [YanTangZhai] Merge pull request #17 from YanTangZhai/master
e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
1e1ebb4 [YanTangZhai] Update HiveQuerySuite.scala
efc4210 [YanTangZhai] Update HiveQuerySuite.scala
bd2c444 [YanTangZhai] Update HiveQuerySuite.scala
1893956 [YanTangZhai] Merge pull request #14 from marmbrus/pr/3555
59e4de9 [Michael Armbrust] make hive test
718afeb [YanTangZhai] Merge pull request #12 from apache/master
950b21e [YanTangZhai] Update HiveQuerySuite.scala
74175b4 [YanTangZhai] Update HiveQuerySuite.scala
92242c7 [YanTangZhai] Update HiveQl.scala
6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
e249846 [YanTangZhai] Merge pull request #10 from apache/master
d26d982 [YanTangZhai] Merge pull request #9 from apache/master
76d4027 [YanTangZhai] Merge pull request #8 from apache/master
03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
8a00106 [YanTangZhai] Merge pull request #6 from apache/master
cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
cdef539 [YanTangZhai] Merge pull request #1 from apache/master
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #3987 from marmbrus/hiveUdfCaching and squashes the following commits:
8bca2fa [Michael Armbrust] [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-4963
SchemaRDD.sample() return wrong results due to GapSamplingIterator operating on mutable row.
HiveTableScan make RDD with SpecificMutableRow and SchemaRDD.sample() will return GapSamplingIterator for iterating.
override def next(): T = {
val r = data.next()
advance
r
}
GapSamplingIterator.next() return the current underlying element and assigned it to r.
However if the underlying iterator is mutable row just like what HiveTableScan returned, underlying iterator and r will point to the same object.
After advance operation, we drop some underlying elments and it also changed r which is not expected. Then we return the wrong value different from initial r.
To fix this issue, the most direct way is to make HiveTableScan return mutable row with copy just like the initial commit that I have made. This solution will make HiveTableScan can not get the full advantage of reusable MutableRow, but it can make sample operation return correct result.
Further more, we need to investigate GapSamplingIterator.next() and make it can implement copy operation inside it. To achieve this, we should define every elements that RDD can store implement the function like cloneable and it will make huge change.
Author: Yanbo Liang <yanbohappy@gmail.com>
Closes #3827 from yanbohappy/spark-4963 and squashes the following commits:
0912ca0 [Yanbo Liang] code format keep
65c4e7c [Yanbo Liang] import file and clear annotation
55c7c56 [Yanbo Liang] better output of test case
cea7e2e [Yanbo Liang] SchemaRDD add copy operation before Sample operator
e840829 [Yanbo Liang] HiveTableScan return mutable row with copy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow up for #3712.
This PR finally remove ```CommandStrategy``` and make all commands follow ```RunnableCommand``` so they can go with ```case r: RunnableCommand => ExecutedCommand(r) :: Nil```.
One exception is the ```DescribeCommand``` of hive, which is a special case and need to distinguish hive table and temporary table, so still keep ```HiveCommandStrategy``` here.
Author: scwf <wangfei1@huawei.com>
Closes #3948 from scwf/followup-SPARK-4861 and squashes the following commits:
6b48e64 [scwf] minor style fix
2c62e9d [scwf] fix for hive module
5a7a819 [scwf] Refactory command in spark sql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like:
```
CREATE TEMPORARY TABLE avroTable
USING org.apache.spark.sql.avro
OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
```
With this PR user can define schema instead of infer from file, so support ddl command as follows:
```
CREATE TEMPORARY TABLE avroTable(a int, b string)
USING org.apache.spark.sql.avro
OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
```
Author: scwf <wangfei1@huawei.com>
Author: Yin Huai <yhuai@databricks.com>
Author: Fei Wang <wangfei1@huawei.com>
Author: wangfei <wangfei1@huawei.com>
Closes #3431 from scwf/ddl and squashes the following commits:
7e79ce5 [Fei Wang] Merge pull request #22 from yhuai/pr3431yin
38f634e [Yin Huai] Remove Option from createRelation.
65e9c73 [Yin Huai] Revert all changes since applying a given schema has not been testd.
a852b10 [scwf] remove cleanIdentifier
f336a16 [Fei Wang] Merge pull request #21 from yhuai/pr3431yin
baf79b5 [Yin Huai] Test special characters quoted by backticks.
50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to StringType.
1eeb769 [Fei Wang] Merge pull request #20 from yhuai/pr3431yin
f5c22b0 [Yin Huai] Refactor code and update test cases.
f1cffe4 [Yin Huai] Revert "minor refactory"
b621c8f [scwf] minor refactory
d02547f [scwf] fix HiveCompatibilitySuite test failure
8dfbf7a [scwf] more tests for complex data type
ddab984 [Fei Wang] Merge pull request #19 from yhuai/pr3431yin
91ad91b [Yin Huai] Parse data types in DDLParser.
cf982d2 [scwf] fixed test failure
445b57b [scwf] address comments
02a662c [scwf] style issue
44eb70c [scwf] fix decimal parser issue
83b6fc3 [scwf] minor fix
9bf12f8 [wangfei] adding test case
7787ec7 [wangfei] added SchemaRelationProvider
0ba70df [wangfei] draft version
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pull only fixes the parsing error and changes API to use tableIdentifier. Joining different catalog datasource related change is not done in this pull.
Author: Alex Liu <alex_liu68@yahoo.com>
Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following commits:
343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review
29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests
6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error
3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support db/catalog ...
|
|
|
|
|
|
|
|
|
| |
Author: Alex Liu <alex_liu68@yahoo.com>
Closes #3766 from alexliu68/SPARK-SQL-4925 and squashes the following commits:
3137b51 [Alex Liu] [SPARK-4925][SQL] Remove sql/hive-thriftserver module from pom.xml
15f2e38 [Alex Liu] [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact
|
|
|
|
|
|
|
|
|
|
| |
CaseInsensitiveMap throws java.io.NotSerializableException.
Author: luogankun <luogankun@gmail.com>
Closes #3944 from luogankun/SPARK-5141 and squashes the following commits:
b6d63d5 [luogankun] [SPARK-5141]CaseInsensitiveMap throws java.io.NotSerializableException
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change does a few things to make the hadoop-provided profile more useful:
- Create new profiles for other libraries / services that might be provided by the infrastructure
- Simplify and fix the poms so that the profiles are only activated while building assemblies.
- Fix tests so that they're able to run when the profiles are activated
- Add a new env variable to be used by distributions that use these profiles to provide the runtime
classpath for Spark jobs and daemons.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
82eb688 [Marcelo Vanzin] Add a comment.
eb228c0 [Marcelo Vanzin] Fix borked merge.
4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
371ebee [Marcelo Vanzin] Review feedback.
52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
322f882 [Marcelo Vanzin] Fix merge fail.
f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9640503 [Marcelo Vanzin] Cleanup child process log message.
115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
d1399ed [Marcelo Vanzin] Restore jetty dependency.
82a54b9 [Marcelo Vanzin] Remove unused profile.
5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR:
- Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
- Tells `surefire` to test only Java tests
- Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
Author: Sean Owen <sowen@cloudera.com>
Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
|
|
|
|
|
|
|
|
|
|
|
| |
name" notation in SQL DSL.
Author: Reynold Xin <rxin@databricks.com>
Closes #3862 from rxin/stringcontext-attr and squashes the following commits:
9b10f57 [Reynold Xin] Rename StrongToAttributeConversionHelper
72121af [Reynold Xin] [SPARK-5040][SQL] Support expressing unresolved attributes using $"attribute name" notation in SQL DSL.
|
|
|
|
|
|
|
|
|
|
| |
As we learned in https://github.com/apache/spark/pull/3580, not explicitly typing implicit functions can lead to compiler bugs and potentially unexpected runtime behavior.
Author: Reynold Xin <rxin@databricks.com>
Closes #3859 from rxin/sql-implicits and squashes the following commits:
30c2c24 [Reynold Xin] [SPARK-5038] Add explicit return type for implicit functions in Spark SQL.
|