| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This reverts commit b77f87673d1f9f03d4c83cf583158227c551359b.
|
|
|
|
| |
This reverts commit 0a16abadc59082b7d3a24d7f3625236658632813.
|
|
|
|
|
|
| |
MetastoreRelation's sameresult method only compare databasename and table name)"
This reverts commit 54864403c4f132d9c1380c015122a849dd44dff8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MetastoreRelation's sameresult method only compare databasename and table name)
override the MetastoreRelation's sameresult method only compare databasename and table name
because in previous :
cache table t1;
select count(*) from t1;
it will read data from memory but the sql below will not,instead it read from hdfs:
select count(*) from t1 t;
because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name
Author: seayi <405078363@qq.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #3898 from seayi/branch-1.2 and squashes the following commits:
8f0c7d2 [seayi] Update CachedTableSuite.scala
a277120 [seayi] Update HiveMetastoreCatalog.scala
8d910aa [seayi] Update HiveMetastoreCatalog.scala
|
| |
|
| |
|
|
|
|
| |
This reverts commit 3e2d7d310b76c293b9ac787f204e6880f508f6ec.
|
|
|
|
| |
This reverts commit f53a4319ba5f0843c077e64ae5a41e2fac835a5b.
|
| |
|
| |
|
|
|
|
| |
This reverts commit e87eb2b42f137c22194cfbca2abf06fecdf943da.
|
|
|
|
| |
This reverts commit adfed7086f10fa8db4eeac7996c84cf98f625e9a.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
from a projection(Backport to Spark-1.2)
This is a follow up of #3796 , which can not be merged back to Spark-1.2. Manually merge it.
Author: Cheng Hao <hao.cheng@intel.com>
Closes #4013 from chenghao-intel/spark_4959_backport and squashes the following commits:
1f6c93d [Cheng Hao] backport to Spark-1.2
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Followup to #3870. Props to rahulaggarwalguavus for identifying the issue.
Author: Michael Armbrust <michael@databricks.com>
Closes #3990 from marmbrus/SPARK-5049 and squashes the following commits:
dd03e4e [Michael Armbrust] Fill in the partition values of parquet scans instead of using JoinedRow
(cherry picked from commit 5d9fa550820543ee1b0ce82997917745973a5d65)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Disables the Spark web UI in HiveThriftServer2Suite in order to prevent Jenkins test failures due to port contention.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #3998 from JoshRosen/SPARK-5200 and squashes the following commits:
a384416 [Josh Rosen] [SPARK-5200] Disable web UI in Hive Thriftserver tests.
(cherry picked from commit 82fd38dcdcc9f7df18930c0e08cc8ec34eaee828)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Conflicts:
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #3987 from marmbrus/hiveUdfCaching and squashes the following commits:
8bca2fa [Michael Armbrust] [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause
(cherry picked from commit 3684fd21e1ffdc0adaad8ff6b31394b637e866ce)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The pull only fixes the parsing error and changes API to use tableIdentifier. Joining different catalog datasource related change is not done in this pull.
Author: Alex Liu <alex_liu68@yahoo.com>
Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following commits:
343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review
29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests
6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error
3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support db/catalog ...
(cherry picked from commit 4b39fd1e63188821fc84a13f7ccb6e94277f4be7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Conflicts:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Alex Liu <alex_liu68@yahoo.com>
Closes #3766 from alexliu68/SPARK-SQL-4925 and squashes the following commits:
3137b51 [Alex Liu] [SPARK-4925][SQL] Remove sql/hive-thriftserver module from pom.xml
15f2e38 [Alex Liu] [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact
(cherry picked from commit 1e56eba5d906bef793dfd6f199db735a6116a764)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert type of RowWriteSupport.attributes to Array.
Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared).
Measurements on 575 column table showed this change made a 6x improvement in write times.
Author: Michael Davies <Michael.BellDavies@gmail.com>
Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits:
892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files
(cherry picked from commit 7425bec320227bf8818dc2844c12d5373d166364)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is just a quick fix that locks when calling `runHive`. If we can find a way to avoid the error without a global lock that would be better.
Author: Michael Armbrust <michael@databricks.com>
Closes #3834 from marmbrus/hiveConcurrency and squashes the following commits:
bf25300 [Michael Armbrust] prevent multiple concurrent hive native commands
(cherry picked from commit 480bd1d2edd1de06af607b0cf3ff3c0b16089add)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: jerryshao <saisai.shao@intel.com>
Closes #3698 from jerryshao/SPARK-4847 and squashes the following commits:
4741130 [jerryshao] Make later added extraStrategies effect when calling strategies
(cherry picked from commit dc8280dcca7b54793a3db644f74fd33460960d4a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from Hive's LazyBinaryInteger
This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions.
Author: Sean Owen <sowen@cloudera.com>
Closes #3692 from srowen/SPARK-4814 and squashes the following commits:
caca704 [Sean Owen] Disable assertions just for Hive
f71e783 [Sean Owen] Enable assertions for SBT and Maven build
(cherry picked from commit 81112e4b573292e76c7feeed995751bd7a5fe489)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix bug when query like:
```
test("save join to table") {
val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString))
sql("CREATE TABLE test1 (key INT, value STRING)")
testData.insertInto("test1")
sql("CREATE TABLE test2 (key INT, value STRING)")
testData.insertInto("test2")
testData.insertInto("test2")
sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test")
checkAnswer(
table("test"),
sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq)
}
```
Author: Cheng Hao <hao.cheng@intel.com>
Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits:
e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS
e004895 [Cheng Hao] fix bug
(cherry picked from commit 0abbff286220bbcbbf28fbd80b8c5bf59ff37ce2)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 2b72c569a674cccf79ebbe8d067b8dbaaf78007f.
|
|
|
|
| |
This reverts commit bc05df8a23ba7ad485f6844f28f96551b13ba461.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a wrapper
Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance.
Author: Cheng Hao <hao.cheng@intel.com>
Author: Cheng Lian <lian@databricks.com>
Closes #3640 from chenghao-intel/udf_serde and squashes the following commits:
8e13756 [Cheng Hao] Update the comment
74466a3 [Cheng Hao] refactor as feedbacks
396c0e1 [Cheng Hao] avoid Simple UDF to be serialized
e9c3212 [Cheng Hao] update the comment
19cbd46 [Cheng Hao] support udf instance ser/de after initialization
(cherry picked from commit 383c5555c9f26c080bc9e3a463aab21dd5b3797f)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the code refactor and follow ups for #2570
Author: Cheng Hao <hao.cheng@intel.com>
Closes #3336 from chenghao-intel/createtbl and squashes the following commits:
3563142 [Cheng Hao] remove the unused variable
e215187 [Cheng Hao] eliminate the compiling warning
4f97f14 [Cheng Hao] fix bug in unittest
5d58812 [Cheng Hao] revert the API changes
b85b620 [Cheng Hao] fix the regression of temp tabl not found in CTAS
(cherry picked from commit 51b1fe1426ffecac6c4644523633ea1562ff9a4e)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected (the Thrift server is started by `spark-submit`, which handles configuration properties properly).
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3621)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #3621 from liancheng/kryo-by-default and squashes the following commits:
70c2775 [Cheng Lian] Enables Kryo by default in Spark SQL Thrift server
(cherry picked from commit 6f61e1f961826a6c9e98a66d10b271b7e3c7dd55)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Michael Armbrust <michael@databricks.com>
Closes #3613 from marmbrus/parquetPartitionPruning and squashes the following commits:
4f138f8 [Michael Armbrust] Use catalyst for partition pruning in newParquet.
(cherry picked from commit f5801e813f3c2573ebaf1af839341489ddd3ec78)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 1056e9ec13203d0c51564265e94d77a054498fdb.
|
|
|
|
| |
This reverts commit 00316cc87983b844f6603f351a8f0b84fe1f6035.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal.
Author: Aaron Davidson <aaron@databricks.com>
Closes #3593 from aarondav/seq-opt and squashes the following commits:
962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop
(cherry picked from commit c6c7165e7ecf1690027d6bd4e0620012cd0d2310)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way.
Author: Michael Armbrust <michael@databricks.com>
Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits:
2781d9f [Michael Armbrust] Handle empty lists for newParquet
04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive
(cherry picked from commit 513ef82e85661552e596d0b483b645ac24e86d4d)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using ```executeCollect``` to collect the result, because executeCollect is a custom implementation of collect in spark sql which better than rdd's collect
Author: wangfei <wangfei1@huawei.com>
Closes #3547 from scwf/executeCollect and squashes the following commits:
a5ab68e [wangfei] Revert "adding debug info"
a60d680 [wangfei] fix test failure
0db7ce8 [wangfei] adding debug info
184c594 [wangfei] using executeCollect instead collect
(cherry picked from commit 3ae0cda83c5106136e90d59c20e61db345a5085f)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should use `~` instead of `-` for bitwise NOT.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #3528 from adrian-wang/symbol and squashes the following commits:
affd4ad [Daoyuan Wang] fix code gen test case
56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type
f55fbae [Daoyuan Wang] wrong symbol for bitwise not
(cherry picked from commit 1f5ddf17e831ad9717f0f4b60a727a3381fad4f9)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SELECT max(1/0) FROM src
would return a very large number, which is obviously not right.
For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0.
I think it is better to keep our behavior with newer Hive version.
This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #3443 from adrian-wang/div and squashes the following commits:
2e98677 [Daoyuan Wang] fix code gen for divide 0
85c28ba [Daoyuan Wang] temp
36236a5 [Daoyuan Wang] add test cases
6f5716f [Daoyuan Wang] fix comments
cee92bd [Daoyuan Wang] avoid evaluation 2 times
22ecd9a [Daoyuan Wang] fix style
cf28c58 [Daoyuan Wang] divide fix
2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type
(cherry picked from commit f6df609dcc4f4a18c0f1c74b1ae0800cf09fa7ae)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
has null
val jsc = new org.apache.spark.api.java.JavaSparkContext(sc)
val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc)
val nrdd = jhc.hql("select null from spark_test.for_test")
println(nrdd.schema)
Then the error is thrown as follows:
scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$)
at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43)
Author: YanTangZhai <hakeemzhai@tencent.com>
Author: yantangzhai <tyz0303@163.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #3538 from YanTangZhai/MatchNullType and squashes the following commits:
e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null
4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null
896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null
6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
e249846 [YanTangZhai] Merge pull request #10 from apache/master
d26d982 [YanTangZhai] Merge pull request #9 from apache/master
76d4027 [YanTangZhai] Merge pull request #8 from apache/master
03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
8a00106 [YanTangZhai] Merge pull request #6 from apache/master
cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
cdef539 [YanTangZhai] Merge pull request #1 from apache/master
(cherry picked from commit 10664276007beca3843638e558f504cad44b1fb3)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: baishuo <vc_java@hotmail.com>
Closes #3526 from baishuo/master-trycatch and squashes the following commits:
d446e14 [baishuo] correct the code style
b36bf96 [baishuo] correct the code style
ae0e447 [baishuo] add finally to avoid resource leak
(cherry picked from commit 69b6fed206565ecb0173d3757bcb5110422887c3)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark SQL has embeded sqrt and abs but DSL doesn't support those functions.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits:
07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite
8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL
(cherry picked from commit e75e04f980281389b881df76f59ba1adc6338629)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support view definition like
CREATE VIEW view3(valoo)
TBLPROPERTIES ("fear" = "factor")
AS SELECT upper(value) FROM src WHERE key=86;
[valoo as the alias of upper(value)]. This is missing part of SPARK-4239, for a fully view support.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #3396 from adrian-wang/viewcolumn and squashes the following commits:
4d001d0 [Daoyuan Wang] support view with column alias
(cherry picked from commit 4df60a8cbc58f2877787245c2a83b2de85579c82)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Author: wangfei <wangfei1@huawei.com>
Closes #3533 from scwf/sql-doc1 and squashes the following commits:
962910b [wangfei] doc and comment fix
(cherry picked from commit 7b79957879db4dfcc7c3601cb40ac4fd576259a5)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: ravipesala <ravindra.pesala@huawei.com>
Closes #3516 from ravipesala/ddl_doc and squashes the following commits:
d101fdf [ravipesala] Style issues fixed
d2238cd [ravipesala] Corrected documentation
(cherry picked from commit bc353819cc86c3b0ad75caf81b47744bfc2aeeb3)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
like count(distinct c1,c2..) in Spark SQL
Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL
Author: ravipesala <ravindra.pesala@huawei.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #3511 from ravipesala/countdistinct and squashes the following commits:
cc4dbb1 [ravipesala] style
070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL
(cherry picked from commit 6a9ff19dc06745144d5b311d4f87073c81d53a8f)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #3208 from viirya/more_numericLit and squashes the following commits:
e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal.
1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer.
cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast.
91fe489 [Liang-Chi Hsieh] add Byte and Short.
1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility.
(cherry picked from commit b57365a1ec89e31470f424ff37d5ebc7c90a39d8)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|