| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
The queries like SELECT a.key FROM (SELECT key FROM src) \`a\` does not work as backticks in subquery aliases are not handled properly. This PR fixes that.
Author : ravipesala ravindra.pesalahuawei.com
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #2737 from ravipesala/SPARK-3834 and squashes the following commits:
0e0ab98 [ravipesala] Fixing issue in backtick handling for subquery aliases
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using `MEMORY_AND_DISK` as default storage level for in-memory table caching. Due to the in-memory columnar representation, recomputing an in-memory cached table partitions can be very expensive.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2686 from liancheng/spark-3824 and squashes the following commits:
35d2ed0 [Cheng Lian] Removes extra space
1ab7967 [Cheng Lian] Reduces test data size to fit DiskStore.getBytes()
ba565f0 [Cheng Lian] Maks CachedBatch serializable
07f0204 [Cheng Lian] Sets in-memory table default storage level to MEMORY_AND_DISK
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR is a follow up of #2590, and tries to introduce a top level SQL parser entry point for all SQL dialects supported by Spark SQL.
A top level parser `SparkSQLParser` is introduced to handle the syntaxes that all SQL dialects should recognize (e.g. `CACHE TABLE`, `UNCACHE TABLE` and `SET`, etc.). For all the syntaxes this parser doesn't recognize directly, it fallbacks to a specified function that tries to parse arbitrary input to a `LogicalPlan`. This function is typically another parser combinator like `SqlParser`. DDL syntaxes introduced in #2475 can be moved to here.
The `ExtendedHiveQlParser` now only handle Hive specific extensions.
Also took the chance to refactor/reformat `SqlParser` for better readability.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2698 from liancheng/gen-sql-parser and squashes the following commits:
ceada76 [Cheng Lian] Minor styling fixes
9738934 [Cheng Lian] Minor refactoring, removes optional trailing ";" in the parser
bb2ab12 [Cheng Lian] SET property value can be empty string
ce8860b [Cheng Lian] Passes test suites
e86968e [Cheng Lian] Removes debugging code
8bcace5 [Cheng Lian] Replaces digit.+ to rep1(digit) (Scala style checking doesn't like it)
d15d54f [Cheng Lian] Unifies SQL and HiveQL parsers
|
|
|
|
|
|
|
|
|
|
| |
This prevents it from changing during serialization, leading to corrupted results.
Author: Michael Armbrust <michael@databricks.com>
Closes #2656 from marmbrus/generateBug and squashes the following commits:
efa32eb [Michael Armbrust] Store the output of a generator in a val. This prevents it from changing during serialization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"case when" conditional function is already supported in Spark SQL but there is no support in SqlParser. So added parser support to it.
Author : ravipesala ravindra.pesalahuawei.com
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #2678 from ravipesala/SPARK-3813 and squashes the following commits:
70c75a7 [ravipesala] Fixed styles
713ea84 [ravipesala] Updated as per admin comments
709684f [ravipesala] Changed parser to support case when function.
|
|
|
|
|
|
|
|
|
|
| |
The alias parameter is being ignored, which makes it more difficult to specify a qualifier for Generator expressions.
Author: Nathan Howell <nhowell@godaddy.com>
Closes #2721 from NathanHowell/SPARK-3858 and squashes the following commits:
8aa0f43 [Nathan Howell] [SPARK-3858][SQL] Pass the generator alias into logical plan node
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
chenghao-intel assigned this to me, check PR #2284 for previous discussion
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #2529 from adrian-wang/rowapi and squashes the following commits:
c6594b2 [Daoyuan Wang] using boxed
7b7e6e3 [Daoyuan Wang] update pattern match
7a39456 [Daoyuan Wang] rename file and refresh getAs[T]
4c18c29 [Daoyuan Wang] remove setAs[T] and null judge
1614493 [Daoyuan Wang] add missing row api
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR aims to provide a way to skip/query corrupt JSON records. To do so, we introduce an internal column to hold corrupt records (the default name is `_corrupt_record`. This name can be changed by setting the value of `spark.sql.columnNameOfCorruptRecord`). When there is a parsing error, we will put the corrupt record in its unparsed format to the internal column. Users can skip/query this column through SQL.
* To query those corrupt records
```
-- For Hive parser
SELECT `_corrupt_record`
FROM jsonTable
WHERE `_corrupt_record` IS NOT NULL
-- For our SQL parser
SELECT _corrupt_record
FROM jsonTable
WHERE _corrupt_record IS NOT NULL
```
* To skip corrupt records and query regular records
```
-- For Hive parser
SELECT field1, field2
FROM jsonTable
WHERE `_corrupt_record` IS NULL
-- For our SQL parser
SELECT field1, field2
FROM jsonTable
WHERE _corrupt_record IS NULL
```
Generally, it is not recommended to change the name of the internal column. If the name has to be changed to avoid possible name conflicts, you can use `sqlContext.setConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD, <new column name>)` or `sqlContext.sql(SET spark.sql.columnNameOfCorruptRecord=<new column name>)`.
Author: Yin Huai <huai@cse.ohio-state.edu>
Closes #2680 from yhuai/corruptJsonRecord and squashes the following commits:
4c9828e [Yin Huai] Merge remote-tracking branch 'upstream/master' into corruptJsonRecord
309616a [Yin Huai] Change the default name of corrupt record to "_corrupt_record".
b4a3632 [Yin Huai] Merge remote-tracking branch 'upstream/master' into corruptJsonRecord
9375ae9 [Yin Huai] Set the column name of corrupt json record back to the default one after the unit test.
ee584c0 [Yin Huai] Provide a way to query corrupt json records as unparsed strings.
|
|
|
|
|
|
|
|
|
|
| |
In JSONRDD.scala, add 'case TimestampType' in the enforceCorrectType function and a toTimestamp function.
Author: Mike Timper <mike@aurorafeint.com>
Closes #2720 from mtimper/master and squashes the following commits:
9386ab8 [Mike Timper] Fix and tests for SPARK-3853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To fix two issues in CliSuite
1 CliSuite throw IndexOutOfBoundsException:
Exception in thread "Thread-6" java.lang.IndexOutOfBoundsException: 6
at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43)
at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)
at org.apache.spark.sql.hive.thriftserver.CliSuite.org$apache$spark$sql$hive$thriftserver$CliSuite$$captureOutput$1(CliSuite.scala:67)
at org.apache.spark.sql.hive.thriftserver.CliSuite$$anonfun$4.apply(CliSuite.scala:78)
at org.apache.spark.sql.hive.thriftserver.CliSuite$$anonfun$4.apply(CliSuite.scala:78)
at scala.sys.process.ProcessLogger$$anon$1.out(ProcessLogger.scala:96)
at scala.sys.process.BasicIO$$anonfun$processOutFully$1.apply(BasicIO.scala:135)
at scala.sys.process.BasicIO$$anonfun$processOutFully$1.apply(BasicIO.scala:135)
at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:175)
at scala.sys.process.BasicIO$.processLinesFully(BasicIO.scala:179)
at scala.sys.process.BasicIO$$anonfun$processFully$1.apply(BasicIO.scala:164)
at scala.sys.process.BasicIO$$anonfun$processFully$1.apply(BasicIO.scala:162)
at scala.sys.process.ProcessBuilderImpl$Simple$$anonfun$3.apply$mcV$sp(ProcessBuilderImpl.scala:73)
at scala.sys.process.ProcessImpl$Spawn$$anon$1.run(ProcessImpl.scala:22)
Actually, it is the Mutil-Threads lead to this problem.
2 Using ```line.startsWith``` instead ```line.contains``` to assert expected answer. This is a tiny bug in CliSuite, for test case "Simple commands", there is a expected answers "5", if we use ```contains``` that means output like "14/10/06 11:```5```4:36 INFO CliDriver: Time taken: 1.078 seconds" or "14/10/06 11:54:36 INFO StatsReportListener: 0% ```5```% 10% 25% 50% 75% 90% 95% 100%" will make the assert true.
Author: scwf <wangfei1@huawei.com>
Closes #2666 from scwf/clisuite and squashes the following commits:
11430db [scwf] fix-clisuite
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list.
Author: Yash Datta <Yash.Datta@guavus.com>
Closes #2561 from saucam/branch-1.1 and squashes the following commits:
4bf2d19 [Yash Datta] SPARK-3711: 1. Fix code style and import order 2. Fix optimization condition 3. Add tests for null in filter list 4. Add test case that optimization is not triggered in case of attributes in filter list
afedbcd [Yash Datta] SPARK-3711: 1. Add test cases for InSet class in ExpressionEvaluationSuite 2. Add class OptimizedInSuite on the lines of ConstantFoldingSuite, for the optimized In clause
0fc902f [Yash Datta] SPARK-3711: UnaryMinus will be handled by constantFolding
bd84c67 [Yash Datta] SPARK-3711: Incorporate review comments. Move optimization of In clause to Optimizer.scala by adding a rule. Add appropriate comments
430f5d1 [Yash Datta] SPARK-3711: Optimize the filter list in case of negative values as well
bee98aa [Yash Datta] SPARK-3711: Optimize where in clause filter queries
|
|
|
|
|
|
|
|
| |
Author: Vida Ha <vida@databricks.com>
Closes #2621 from vidaha/vida/SPARK-3752 and squashes the following commits:
d7fdbbc [Vida Ha] Add tests for different UDF's
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@apache.org>
Closes #2719 from rxin/sql-join-break and squashes the following commits:
0c0082b [Reynold Xin] Fix line length.
cbc664c [Reynold Xin] Rename join -> joins package.
a070d44 [Reynold Xin] Fix line length in HashJoin
a39be8c [Reynold Xin] [SPARK-3857] Create a join package for various join operators.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inserting Hive values
Builds all wrappers at first according to object inspector types to avoid per row costs.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2592 from liancheng/hive-value-wrapper and squashes the following commits:
9696559 [Cheng Lian] Passes all tests
4998666 [Cheng Lian] Prevents per row dynamic dispatching and pattern matching when inserting Hive values
|
|
|
|
|
|
|
|
|
|
| |
Includes partition keys into account when applying `PreInsertionCasts` rule.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2672 from liancheng/fix-pre-insert-casts and squashes the following commits:
def1a1a [Cheng Lian] Makes PreInsertionCasts handle partitions properly
|
|
|
|
|
|
|
|
|
|
|
| |
Calling `BinaryArithmetic.dataType` will throws exception until it's resolved, but in type coercion rule `Division`, seems doesn't follow this.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #2559 from chenghao-intel/type_coercion and squashes the following commits:
199a85d [Cheng Hao] Simplify the divide rule
dc55218 [Cheng Hao] fix bug of type coercion in div
|
|
|
|
|
|
|
|
|
|
|
| |
marmbrus
Update README.md to be consistent with Spark 1.1
Author: Liquan Pei <liquanpei@gmail.com>
Closes #2706 from Ishiihara/SparkSQL-readme and squashes the following commits:
33b9d4b [Liquan Pei] keep README.md up to date
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR uses JSON instead of `toString` to serialize `DataType`s. The latter is not only hard to parse but also flaky in many cases.
Since we already write schema information to Parquet metadata in the old style, we have to reserve the old `DataType` parser and ensure downward compatibility. The old parser is now renamed to `CaseClassStringParser` and moved into `object DataType`.
JoshRosen davies Please help review PySpark related changes, thanks!
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2563 from liancheng/datatype-to-json and squashes the following commits:
fc92eb3 [Cheng Lian] Reverts debugging code, simplifies primitive type JSON representation
438c75f [Cheng Lian] Refactors PySpark DataType JSON SerDe per comments
6b6387b [Cheng Lian] Removes debugging code
6a3ee3a [Cheng Lian] Addresses per review comments
dc158b5 [Cheng Lian] Addresses PEP8 issues
99ab4ee [Cheng Lian] Adds compatibility est case for Parquet type conversion
a983a6c [Cheng Lian] Adds PySpark support
f608c6e [Cheng Lian] De/serializes DataType objects from/to JSON
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we write the filter which is always FALSE like
SELECT * from person WHERE FALSE;
200 tasks will run. I think, 1 task is enough.
And current optimizer cannot optimize the case NOT is duplicated like
SELECT * from person WHERE NOT ( NOT (age > 30));
The filter rule above should be simplified
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #2692 from sarutak/SPARK-3831 and squashes the following commits:
25f3e20 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3831
23c750c [Kousuke Saruta] Improved unsupported predicate test case
a11b9f3 [Kousuke Saruta] Modified NOT predicate test case in PartitionBatchPruningSuite
8ea872b [Kousuke Saruta] Fixed the number of tasks when the data of LocalRelation is empty.
|
|
|
|
|
|
|
|
| |
Author: Renat Yusupov <re.yusupov@2gis.ru>
Closes #2641 from r3natko/feature/catalyst_option and squashes the following commits:
55d0c06 [Renat Yusupov] [SQL] SPARK-3776: Wrong conversion to Catalyst for Option[Product]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lazy caching
Although lazy caching for in-memory table seems consistent with the `RDD.cache()` API, it's relatively confusing for users who mainly work with SQL and not familiar with Spark internals. The `CACHE TABLE t; SELECT COUNT(*) FROM t;` pattern is also commonly seen just to ensure predictable performance.
This PR makes both the `CACHE TABLE t [AS SELECT ...]` statement and the `SQLContext.cacheTable()` API eager by default, and adds a new `CACHE LAZY TABLE t [AS SELECT ...]` syntax to provide lazy in-memory table caching.
Also, took the chance to make some refactoring: `CacheCommand` and `CacheTableAsSelectCommand` are now merged and renamed to `CacheTableCommand` since the former is strictly a special case of the latter. A new `UncacheTableCommand` is added for the `UNCACHE TABLE t` statement.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2513 from liancheng/eager-caching and squashes the following commits:
fe92287 [Cheng Lian] Makes table caching eager by default and adds syntax for lazy caching
|
|
|
|
|
|
|
|
|
|
| |
Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two SparkContexts in one jvm and enable JavaHiveQLSuite
Author: scwf <wangfei1@huawei.com>
Closes #2652 from scwf/fix-JavaHiveQLSuite and squashes the following commits:
be35c91 [scwf] enable JavaHiveQLSuite
|
|
|
|
|
|
|
|
|
|
| |
It should just use `maxResults` there.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #2654 from viirya/trivial_fix and squashes the following commits:
1362289 [Liang-Chi Hsieh] Trivial fix to make codes more readable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up of #2226 and #2616 to fix Jenkins master SBT build failures for lower Hadoop versions (1.0.x and 2.0.x).
The root cause is the semantics difference of `FileSystem.globStatus()` between different versions of Hadoop, as illustrated by the following test code:
```scala
object GlobExperiments extends App {
val conf = new Configuration()
val fs = FileSystem.getLocal(conf)
fs.globStatus(new Path("/tmp/wh/*/*/*")).foreach { status =>
println(status.getPath)
}
}
```
Target directory structure:
```
/tmp/wh
├── dir0
│ ├── dir1
│ │ └── level2
│ └── level1
└── level0
```
Hadoop 2.4.1 result:
```
file:/tmp/wh/dir0/dir1/level2
```
Hadoop 1.0.4 resuet:
```
file:/tmp/wh/dir0/dir1/level2
file:/tmp/wh/dir0/level1
file:/tmp/wh/level0
```
In #2226 and #2616, we call `FileOutputCommitter.commitJob()` at the end of the job, and the `_SUCCESS` mark file is written. When working with lower Hadoop versions, due to the `globStatus()` semantics issue, `_SUCCESS` is included as a separate partition data file by `Hive.loadDynamicPartitions()`, and fails partition spec checking. The fix introduced in this PR is kind of a hack: when inserting data with dynamic partitioning, we intentionally avoid writing the `_SUCCESS` marker to workaround this issue.
Hive doesn't suffer this issue because `FileSinkOperator` doesn't call `FileOutputCommitter.commitJob()`, instead, it calls `Utilities.mvFileToFinalPath()` to cleanup the output directory and then loads it into Hive warehouse by with `loadDynamicPartitions()`/`loadPartition()`/`loadTable()`. This approach is better because it handles failed job and speculative tasks properly. We should add this step to `InsertIntoHiveTable` in another PR.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2663 from liancheng/dp-hadoop-1-fix and squashes the following commits:
0177dae [Cheng Lian] Fixes dynamic partitioning support for lower Hadoop versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
table caching
_Also addresses: SPARK-1671, SPARK-1379 and SPARK-3641_
This PR introduces a new trait, `CacheManger`, which replaces the previous temporary table based caching system. Instead of creating a temporary table that shadows an existing table with and equivalent cached representation, the cached manager maintains a separate list of logical plans and their cached data. After optimization, this list is searched for any matching plan fragments. When a matching plan fragment is found it is replaced with the cached data.
There are several advantages to this approach:
- Calling .cache() on a SchemaRDD now works as you would expect, and uses the more efficient columnar representation.
- Its now possible to provide a list of temporary tables, without having to decide if a given table is actually just a cached persistent table. (To be done in a follow-up PR)
- In some cases it is possible that cached data will be used, even if a cached table was not explicitly requested. This is because we now look at the logical structure instead of the table name.
- We now correctly invalidate when data is inserted into a hive table.
Author: Michael Armbrust <michael@databricks.com>
Closes #2501 from marmbrus/caching and squashes the following commits:
63fbc2c [Michael Armbrust] Merge remote-tracking branch 'origin/master' into caching.
0ea889e [Michael Armbrust] Address comments.
1e23287 [Michael Armbrust] Add support for cache invalidation for hive inserts.
65ed04a [Michael Armbrust] fix tests.
bdf9a3f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into caching
b4b77f2 [Michael Armbrust] Address comments
6923c9d [Michael Armbrust] More comments / tests
80f26ac [Michael Armbrust] First draft of improved semantics for Spark SQL caching.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PR #2226 was reverted because it broke Jenkins builds for unknown reason. This debugging PR aims to fix the Jenkins build.
This PR also fixes two bugs:
1. Compression configurations in `InsertIntoHiveTable` are disabled by mistake
The `FileSinkDesc` object passed to the writer container doesn't have compression related configurations. These configurations are not taken care of until `saveAsHiveFile` is called. This PR moves compression code forward, right after instantiation of the `FileSinkDesc` object.
1. `PreInsertionCasts` doesn't take table partitions into account
In `castChildOutput`, `table.attributes` only contains non-partition columns, thus for partitioned table `childOutputDataTypes` never equals to `tableOutputDataTypes`. This results funny analyzed plan like this:
```
== Analyzed Logical Plan ==
InsertIntoTable Map(partcol1 -> None, partcol2 -> None), false
MetastoreRelation default, dynamic_part_table, None
Project [c_0#1164,c_1#1165,c_2#1166]
Project [c_0#1164,c_1#1165,c_2#1166]
Project [c_0#1164,c_1#1165,c_2#1166]
... (repeats 99 times) ...
Project [c_0#1164,c_1#1165,c_2#1166]
Project [c_0#1164,c_1#1165,c_2#1166]
Project [1 AS c_0#1164,1 AS c_1#1165,1 AS c_2#1166]
Filter (key#1170 = 150)
MetastoreRelation default, src, None
```
Awful though this logical plan looks, it's harmless because all projects will be eliminated by optimizer. Guess that's why this issue hasn't been caught before.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Author: baishuo(白硕) <vc_java@hotmail.com>
Author: baishuo <vc_java@hotmail.com>
Closes #2616 from liancheng/dp-fix and squashes the following commits:
21935b6 [Cheng Lian] Adds back deleted trailing space
f471c4b [Cheng Lian] PreInsertionCasts should take table partitions into account
a132c80 [Cheng Lian] Fixes output compression
9c6eb2d [Cheng Lian] Adds tests to verify dynamic partitioning folder layout
0eed349 [Cheng Lian] Addresses @yhuai's comments
26632c3 [Cheng Lian] Adds more tests
9227181 [Cheng Lian] Minor refactoring
c47470e [Cheng Lian] Refactors InsertIntoHiveTable to a Command
6fb16d7 [Cheng Lian] Fixes typo in test name, regenerated golden answer files
d53daa5 [Cheng Lian] Refactors dynamic partitioning support
b821611 [baishuo] pass check style
997c990 [baishuo] use HiveConf.DEFAULTPARTITIONNAME to replace hive.exec.default.partition.name
761ecf2 [baishuo] modify according micheal's advice
207c6ac [baishuo] modify for some bad indentation
caea6fb [baishuo] modify code to pass scala style checks
b660e74 [baishuo] delete a empty else branch
cd822f0 [baishuo] do a little modify
8e7268c [baishuo] update file after test
3f91665 [baishuo(白硕)] Update Cast.scala
8ad173c [baishuo(白硕)] Update InsertIntoHiveTable.scala
051ba91 [baishuo(白硕)] Update Cast.scala
d452eb3 [baishuo(白硕)] Update HiveQuerySuite.scala
37c603b [baishuo(白硕)] Update InsertIntoHiveTable.scala
98cfb1f [baishuo(白硕)] Update HiveCompatibilitySuite.scala
6af73f4 [baishuo(白硕)] Update InsertIntoHiveTable.scala
adf02f1 [baishuo(白硕)] Update InsertIntoHiveTable.scala
1867e23 [baishuo(白硕)] Update SparkHadoopWriter.scala
6bb5880 [baishuo(白硕)] Update HiveQl.scala
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implemented UDAF Hive aggregates by adding wrapper to Spark Hive.
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #2620 from ravipesala/SPARK-2693 and squashes the following commits:
a8df326 [ravipesala] Removed resolver from constructor arguments
caf25c6 [ravipesala] Fixed style issues
5786200 [ravipesala] Supported for UDAF Hive Aggregates like PERCENTILE
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
separate parser combinator
Created separate parser for hql. It preparses the commands like cache,uncache,add jar etc.. and then parses with HiveQl
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #2590 from ravipesala/SPARK-3654 and squashes the following commits:
bbca7dd [ravipesala] Fixed code as per admin comments.
ae9290a [ravipesala] Fixed style issues as per Admin comments
898ed81 [ravipesala] Removed spaces
fb24edf [ravipesala] Updated the code as per admin comments
8947d37 [ravipesala] Removed duplicate code
ba26cd1 [ravipesala] Created seperate parser for hql.It pre parses the commands like cache,uncache,add jar etc.. and then parses with HiveQl
|
|
|
|
|
|
|
|
|
|
| |
With the old ordering it was possible for commands in the HiveDriver to NPE due to the lack of configuration in the threadlocal session state.
Author: Michael Armbrust <michael@databricks.com>
Closes #2635 from marmbrus/initOrder and squashes the following commits:
9749850 [Michael Armbrust] Initilize session state before creating CommandProcessor
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following code gives error.
```
sqlContext.registerFunction("len", (s: String) => s.length)
sqlContext.sql("select len(foo) as a, count(1) from t1 group by len(foo)").collect()
```
Because SQl parser creates the aliases to the functions in grouping expressions with generated alias names. So if user gives the alias names to the functions inside projection then it does not match the generated alias name of grouping expression.
This kind of queries are working in Hive.
So the fix I have given that if user provides alias to the function in projection then don't generate alias in grouping expression,use the same alias.
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #2511 from ravipesala/SPARK-3371 and squashes the following commits:
9fb973f [ravipesala] Removed aliases to grouping expressions.
f8ace79 [ravipesala] Fixed the testcase issue
bad2fd0 [ravipesala] SPARK-3371 : Fixed Renaming a function expression with group by gives error
|
|
|
|
|
|
|
|
|
|
| |
case ```ShortType```, we should add short value to hive row. Int value may lead to some problems.
Author: scwf <wangfei1@huawei.com>
Closes #2551 from scwf/fix-addColumnValue and squashes the following commits:
08bcc59 [scwf] ColumnValue.shortValue for short type
|
|
|
|
|
|
|
|
|
|
| |
This change avoids a NPE during context initialization when settings are present.
Author: Michael Armbrust <michael@databricks.com>
Closes #2583 from marmbrus/configNPE and squashes the following commits:
da2ec57 [Michael Armbrust] Do all hive session state initilialization in lazy val
|
|
|
|
|
|
|
|
|
|
| |
Considering `Command.executeCollect()` simply delegates to `Command.sideEffectResult`, we no longer need to leave the latter `protected[sql]`.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2431 from liancheng/narrow-scope and squashes the following commits:
1bfc16a [Cheng Lian] Made Command.sideEffectResult protected
|
|
|
|
|
|
|
|
|
|
|
|
| |
BinaryType is derived from NativeType and added Ordering support.
Author: Venkata Ramana G <ramana.gollamudihuawei.com>
Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
Closes #2617 from gvramana/binarytype_sort and squashes the following commits:
1cf26f3 [Venkata Ramana Gollamudi] Supported Sorting of BinaryType
|
|
|
|
|
|
|
|
|
|
| |
add case for VoidObjectInspector in ```inspectorToDataType```
Author: scwf <wangfei1@huawei.com>
Closes #2552 from scwf/inspectorToDataType and squashes the following commits:
453d892 [scwf] add case for VoidObjectInspector
|
|
|
|
|
|
|
|
|
|
|
|
| |
The below query gives error
sql("SELECT k FROM (SELECT \`key\` AS \`k\` FROM src) a")
It gives error because the aliases are not cleaned so it could not be resolved in further processing.
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #2594 from ravipesala/SPARK-3708 and squashes the following commits:
d55db54 [ravipesala] Fixed SPARK-3708 (Backticks aren't handled correctly is aliases)
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #2598 from marmbrus/hiveClientLock and squashes the following commits:
ca89fe8 [Michael Armbrust] Lock hive client when creating tables
|
|
|
|
|
|
|
|
|
|
|
|
| |
MD5 of query strings in `createQueryTest` calls are used to generate golden files, leaving trailing spaces there can be really dangerous. Got bitten by this while working on #2616: my "smart" IDE automatically removed a trailing space and makes Jenkins fail.
(Really should add "no trailing space" to our coding style guidelines!)
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2619 from liancheng/kill-trailing-space and squashes the following commits:
034f119 [Cheng Lian] Kill dangerous trailing space in query string
|
|
|
|
|
|
|
|
|
|
| |
Thread names are useful for correlating failures.
Author: Reynold Xin <rxin@apache.org>
Closes #2600 from rxin/log4j and squashes the following commits:
83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
|
|
|
|
| |
This reverts commit 0bbe7faeffa17577ae8a33dfcd8c4c783db5c909.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a new PR base on new master. changes are the same as https://github.com/apache/spark/pull/1919
Author: baishuo(白硕) <vc_java@hotmail.com>
Author: baishuo <vc_java@hotmail.com>
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #2226 from baishuo/patch-3007 and squashes the following commits:
e69ce88 [Cheng Lian] Adds tests to verify dynamic partitioning folder layout
b20a3dc [Cheng Lian] Addresses @yhuai's comments
096bbbc [baishuo(白硕)] Merge pull request #1 from liancheng/refactor-dp
1093c20 [Cheng Lian] Adds more tests
5004542 [Cheng Lian] Minor refactoring
fae9eff [Cheng Lian] Refactors InsertIntoHiveTable to a Command
528e84c [Cheng Lian] Fixes typo in test name, regenerated golden answer files
c464b26 [Cheng Lian] Refactors dynamic partitioning support
5033928 [baishuo] pass check style
2201c75 [baishuo] use HiveConf.DEFAULTPARTITIONNAME to replace hive.exec.default.partition.name
b47c9bf [baishuo] modify according micheal's advice
c3ab36d [baishuo] modify for some bad indentation
7ce2d9f [baishuo] modify code to pass scala style checks
37c1c43 [baishuo] delete a empty else branch
66e33fc [baishuo] do a little modify
88d0110 [baishuo] update file after test
a3961d9 [baishuo(白硕)] Update Cast.scala
f7467d0 [baishuo(白硕)] Update InsertIntoHiveTable.scala
c1a59dd [baishuo(白硕)] Update Cast.scala
0e18496 [baishuo(白硕)] Update HiveQuerySuite.scala
60f70aa [baishuo(白硕)] Update InsertIntoHiveTable.scala
0a50db9 [baishuo(白硕)] Update HiveCompatibilitySuite.scala
491c7d0 [baishuo(白硕)] Update InsertIntoHiveTable.scala
a2374a8 [baishuo(白硕)] Update InsertIntoHiveTable.scala
701a814 [baishuo(白硕)] Update SparkHadoopWriter.scala
dc24c41 [baishuo(白硕)] Update HiveQl.scala
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@apache.org>
Closes #2560 from rxin/TaskContext and squashes the following commits:
9eff95a [Reynold Xin] [SPARK-3543] remaining cleanup work.
|
|
|
|
|
|
|
|
|
|
| |
Typing of UDFs should be lazy as it is often not valid to call `dataType` on an expression until after all of its children are `resolved`.
Author: Michael Armbrust <michael@databricks.com>
Closes #2525 from marmbrus/concatBug and squashes the following commits:
5b8efe7 [Michael Armbrust] fix bug with eager typing of udfs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a bug in JDK6: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4428022
this is because jdk get different result to operate ```double```,
```System.out.println(1/500d)``` in different jdk get different result
jdk 1.6.0(_31) ---- 0.0020
jdk 1.7.0(_05) ---- 0.002
this leads to HiveQuerySuite failed when generate golden answer in jdk 1.7 and run tests in jdk 1.6, result did not match
Author: w00228970 <wangfei1@huawei.com>
Closes #2517 from scwf/HiveQuerySuite and squashes the following commits:
0cb5e8d [w00228970] delete golden answer of division-0 and timestamp cast #1
1df3964 [w00228970] Jdk version leads to different query output for Double, this make HiveQuerySuite failed
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #2515 from marmbrus/jdbcExistingContext and squashes the following commits:
7866fad [Michael Armbrust] Allows starting a JDBC server on an existing context.
|
|
|
|
|
|
|
|
|
|
|
|
| |
User may be confused for the HQL logging & configurations, we'd better provide a default templates.
Both files are copied from Hive.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #2263 from chenghao-intel/hive_template and squashes the following commits:
53bffa9 [Cheng Hao] Remove the hive-log4j.properties initialization
|
|
|
|
|
|
|
|
| |
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #2396 from adrian-wang/selectnull and squashes the following commits:
2458229 [Daoyuan Wang] rebase solution
|
|
|
|
|
|
|
|
|
|
|
|
| |
created.
This will allow us to take advantage of things like the spark.defaults file.
Author: Michael Armbrust <michael@databricks.com>
Closes #2493 from marmbrus/copySparkConf and squashes the following commits:
0bd1377 [Michael Armbrust] Copy SQL configuration from SparkConf when a SQLContext is created.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Supported modulus operation using % operator on fractional datatypes FloatType, DoubleType and DecimalType
Example:
SELECT 1388632775.0 % 60 from tablename LIMIT 1
Author : Venkata Ramana Gollamudi ramana.gollamudihuawei.com
Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
Closes #2457 from gvramana/double_modulus_support and squashes the following commits:
79172a8 [Venkata Ramana Gollamudi] Add hive cache to testcase
c09bd5b [Venkata Ramana Gollamudi] Added a HiveQuerySuite testcase
193fa81 [Venkata Ramana Gollamudi] corrected testcase
3624471 [Venkata Ramana Gollamudi] modified testcase
e112c09 [Venkata Ramana Gollamudi] corrected the testcase
513d0e0 [Venkata Ramana Gollamudi] modified to add modulus support to fractional types float,double,decimal
296d253 [Venkata Ramana Gollamudi] modified to add modulus support to fractional types float,double,decimal
|
|
|
|
|
|
|
|
|
|
| |
a follow up of https://github.com/apache/spark/pull/2377 and https://github.com/apache/spark/pull/2352, see detail there.
Author: wangfei <wangfei1@huawei.com>
Closes #2505 from scwf/patch-6 and squashes the following commits:
4874ec8 [wangfei] removes the evil MINOR HACK
|