spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "Preparing development version 1.2.2-SNAPSHOT"	Patrick Wendell	2015-01-26	1	-1/+1
\| \| \| \|	This reverts commit adfed7086f10fa8db4eeac7996c84cf98f625e9a.
*	Preparing development version 1.2.2-SNAPSHOT	Ubuntu	2015-01-27	1	-1/+1
\|
*	Preparing Spark release v1.2.1-rc1	Ubuntu	2015-01-27	1	-1/+1
\|
*	[SPARK-4959][SQL] Attributes are case sensitive when using a select query ↵	Cheng Hao	2015-01-20	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	from a projection(Backport to Spark-1.2) This is a follow up of #3796 , which can not be merged back to Spark-1.2. Manually merge it. Author: Cheng Hao <hao.cheng@intel.com> Closes #4013 from chenghao-intel/spark_4959_backport and squashes the following commits: 1f6c93d [Cheng Hao] backport to Spark-1.2
*	[SPARK-4943][SQL] Allow table name having dot for db/catalog	Alex Liu	2015-01-10	7	-77/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pull only fixes the parsing error and changes API to use tableIdentifier. Joining different catalog datasource related change is not done in this pull. Author: Alex Liu <alex_liu68@yahoo.com> Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following commits: 343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review 29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests 6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error 3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support db/catalog ... (cherry picked from commit 4b39fd1e63188821fc84a13f7ccb6e94277f4be7) Signed-off-by: Michael Armbrust <michael@databricks.com> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
*	[SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable	Cheng Hao	2014-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix bug when query like: ``` test("save join to table") { val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString)) sql("CREATE TABLE test1 (key INT, value STRING)") testData.insertInto("test1") sql("CREATE TABLE test2 (key INT, value STRING)") testData.insertInto("test2") testData.insertInto("test2") sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test") checkAnswer( table("test"), sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq) } ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits: e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS e004895 [Cheng Hao] fix bug (cherry picked from commit 0abbff286220bbcbbf28fbd80b8c5bf59ff37ce2) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-12-10	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc2v1.2.0	Patrick Wendell	2014-12-10	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-rc2"	Patrick Wendell	2014-12-10	1	-1/+1
\| \| \| \|	This reverts commit 2b72c569a674cccf79ebbe8d067b8dbaaf78007f.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-12-10	1	-1/+1
\| \| \| \|	This reverts commit bc05df8a23ba7ad485f6844f28f96551b13ba461.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-12-04	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc2	Patrick Wendell	2014-12-04	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-12-04	1	-1/+1
\| \| \| \|	This reverts commit 1056e9ec13203d0c51564265e94d77a054498fdb.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-12-04	1	-1/+1
\| \| \| \|	This reverts commit 00316cc87983b844f6603f351a8f0b84fe1f6035.
*	[SQL] Minor: Avoid calling Seq#size in a loop	Aaron Davidson	2014-12-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Just found this instance while doing some jstack-based profiling of a Spark SQL job. It is very unlikely that this is causing much of a perf issue anywhere, but it is unnecessarily suboptimal. Author: Aaron Davidson <aaron@databricks.com> Closes #3593 from aarondav/seq-opt and squashes the following commits: 962cdfc [Aaron Davidson] [SQL] Minor: Avoid calling Seq#size in a loop (cherry picked from commit c6c7165e7ecf1690027d6bd4e0620012cd0d2310) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-4670] [SQL] wrong symbol for bitwise not	Daoyuan Wang	2014-12-02	2	-10/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should use `~` instead of `-` for bitwise NOT. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3528 from adrian-wang/symbol and squashes the following commits: affd4ad [Daoyuan Wang] fix code gen test case 56efb79 [Daoyuan Wang] ensure bitwise NOT over byte and short persist data type f55fbae [Daoyuan Wang] wrong symbol for bitwise not (cherry picked from commit 1f5ddf17e831ad9717f0f4b60a727a3381fad4f9) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4593][SQL] Return null when denominator is 0	Daoyuan Wang	2014-12-02	4	-5/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SELECT max(1/0) FROM src would return a very large number, which is obviously not right. For hive-0.12, hive would return `Infinity` for 1/0, while for hive-0.13.1, it is `NULL` for 1/0. I think it is better to keep our behavior with newer Hive version. This PR ensures that when the divider is 0, the result of expression should be NULL, same with hive-0.13.1 Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #3443 from adrian-wang/div and squashes the following commits: 2e98677 [Daoyuan Wang] fix code gen for divide 0 85c28ba [Daoyuan Wang] temp 36236a5 [Daoyuan Wang] add test cases 6f5716f [Daoyuan Wang] fix comments cee92bd [Daoyuan Wang] avoid evaluation 2 times 22ecd9a [Daoyuan Wang] fix style cf28c58 [Daoyuan Wang] divide fix 2dfe50f [Daoyuan Wang] return null when divider is 0 of Double type (cherry picked from commit f6df609dcc4f4a18c0f1c74b1ae0800cf09fa7ae) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4536][SQL] Add sqrt and abs to Spark SQL DSL	Kousuke Saruta	2014-12-02	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark SQL has embeded sqrt and abs but DSL doesn't support those functions. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3401 from sarutak/dsl-missing-operator and squashes the following commits: 07700cf [Kousuke Saruta] Modified Literal(null, NullType) to Literal(null) in DslQuerySuite 8f366f8 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 1b88e2e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator 0396f89 [Kousuke Saruta] Added sqrt and abs to Spark SQL DSL (cherry picked from commit e75e04f980281389b881df76f59ba1adc6338629) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4650][SQL] Supporting multi column support in countDistinct function ↵	ravipesala	2014-12-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	like count(distinct c1,c2..) in Spark SQL Supporting multi column support in countDistinct function like count(distinct c1,c2..) in Spark SQL Author: ravipesala <ravindra.pesala@huawei.com> Author: Michael Armbrust <michael@databricks.com> Closes #3511 from ravipesala/countdistinct and squashes the following commits: cc4dbb1 [ravipesala] style 070e12a [ravipesala] Supporting multi column support in count(distinct c1,c2..) in Spark SQL (cherry picked from commit 6a9ff19dc06745144d5b311d4f87073c81d53a8f) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4358][SQL] Let BigDecimal do checking type compatibility	Liang-Chi Hsieh	2014-12-01	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove hardcoding max and min values for types. Let BigDecimal do checking type compatibility. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3208 from viirya/more_numericLit and squashes the following commits: e9834b4 [Liang-Chi Hsieh] Remove byte and short types for number literal. 1bd1825 [Liang-Chi Hsieh] Fix Indentation and make the modification clearer. cf1a997 [Liang-Chi Hsieh] Modified for comment to add a rule of analysis that adds a cast. 91fe489 [Liang-Chi Hsieh] add Byte and Short. 1bdc69d [Liang-Chi Hsieh] Let BigDecimal do checking type compatibility. (cherry picked from commit b57365a1ec89e31470f424ff37d5ebc7c90a39d8) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-28	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-28	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-28	1	-1/+1
\| \| \| \|	This reverts commit 39c7d1c1f9a7785285cf4c20dfbffd96f72d5634.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-28	1	-1/+1
\| \| \| \|	This reverts commit fc7bff00ac731d2632213a98cd92dc5e84ce7dcd.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-28	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-28	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit cc2c05e4ee81d2f34873a2ebb9a5272867cb65c2.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit 380eba5f49eca1dbd4084e6c84e19866fffd4efa.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-26	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-26	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit 5247dd859b95a440baa562b9827bdeb26aa6530e.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit 79df6b43ae762263a8120f423ddb4a0811dd4b6f.
*	Preparing development version 1.2.1-SNAPSHOT	Patrick Wendell	2014-11-26	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc1	Patrick Wendell	2014-11-26	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-rc1"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit db7f4a898af22a02b36428507f8ef2b429d78dc1.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit d7b1ecb25676d228deb6fe05efdb4e2ab9c3e30b.
*	Preparing development version 1.2.1-SNAPSHOT	Ubuntu	2014-11-26	1	-1/+1
\|
*	Preparing Spark release v1.2.0-rc1	Ubuntu	2014-11-26	1	-1/+1
\|
*	Revert "Preparing Spark release v1.2.0-snapshot1"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit 38c1fbd9694430cefd962c90bc36b0d108c6124b.
*	Revert "Preparing development version 1.2.1-SNAPSHOT"	Patrick Wendell	2014-11-26	1	-1/+1
\| \| \| \|	This reverts commit d7ac6013483e83caff8ea54c228f37aeca159db8.
*	[SPARK-4487][SQL] Fix attribute reference resolution error when using ORDER BY.	Kousuke Saruta	2014-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we use ORDER BY clause, at first, attributes referenced by projection are resolved (1). And then, attributes referenced at ORDER BY clause are resolved (2). But when resolving attributes referenced at ORDER BY clause, the resolution result generated in (1) is discarded so for example, following query fails. SELECT c1 + c2 FROM mytable ORDER BY c1; The query above fails because when resolving the attribute reference 'c1', the resolution result of 'c2' is discarded. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3363 from sarutak/SPARK-4487 and squashes the following commits: fd314f3 [Kousuke Saruta] Fixed attribute resolution logic in Analyzer 6e60c20 [Kousuke Saruta] Fixed conflicts cb5b7e9 [Kousuke Saruta] Added test case for SPARK-4487 282d529 [Kousuke Saruta] Fixed attributes reference resolution error b6123e6 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into concat-feature 317b7fb [Kousuke Saruta] WIP (cherry picked from commit dd1c9cb36cde8202cede8014b5641ae8a0197812) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4522][SQL] Parse schema with missing metadata.	Michael Armbrust	2014-11-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is just a quick fix for 1.2. SPARK-4523 describes a more complete solution. Author: Michael Armbrust <michael@databricks.com> Closes #3392 from marmbrus/parquetMetadata and squashes the following commits: bcc6626 [Michael Armbrust] Parse schema with missing metadata. (cherry picked from commit 90a6a46bd11030672597f015dd443d954107123a) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4318][SQL] Fix empty sum distinct.	Takuya UESHIN	2014-11-20	1	-24/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Executing sum distinct for empty table throws `java.lang.UnsupportedOperationException: empty.reduceLeft`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3184 from ueshin/issues/SPARK-4318 and squashes the following commits: 8168c42 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4318 66fdb0a [Takuya UESHIN] Re-refine aggregate functions. 6186eb4 [Takuya UESHIN] Fix Sum of GeneratedAggregate. d2975f6 [Takuya UESHIN] Refine Sum and Average of GeneratedAggregate. 1bba675 [Takuya UESHIN] Refine Sum, SumDistinct and Average functions. 917e533 [Takuya UESHIN] Use aggregate instead of groupBy(). 1a5f874 [Takuya UESHIN] Add tests to be executed as non-partial aggregation. a5a57d2 [Takuya UESHIN] Fix empty Average. 22799dc [Takuya UESHIN] Fix empty Sum and SumDistinct. 65b7dd2 [Takuya UESHIN] Fix empty sum distinct. (cherry picked from commit 2c2e7a44db2ebe44121226f3eac924a0668b991a) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4513][SQL] Support relational operator '<=>' in Spark SQL	ravipesala	2014-11-20	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The relational operator '<=>' is not working in Spark SQL. Same works in Spark HiveQL Author: ravipesala <ravindra.pesala@huawei.com> Closes #3387 from ravipesala/<=> and squashes the following commits: 7198e90 [ravipesala] Supporting relational operator '<=>' in Spark SQL (cherry picked from commit 98e9419784a9ad5096cfd563fa9a433786a90bd4) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4453][SPARK-4213][SQL] Simplifies Parquet filter generation code	Cheng Lian	2014-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While reviewing PR #3083 and #3161, I noticed that Parquet record filter generation code can be simplified significantly according to the clue stated in [SPARK-4453](https://issues.apache.org/jira/browse/SPARK-4213). This PR addresses both SPARK-4453 and SPARK-4213 with this simplification. While generating `ParquetTableScan` operator, we need to remove all Catalyst predicates that have already been pushed down to Parquet. Originally, we first generate the record filter, and then call `findExpression` to traverse the generated filter to find out all pushed down predicates [[1](https://github.com/apache/spark/blob/64c6b9bad559c21f25cd9fbe37c8813cdab939f2/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L213-L228)]. In this way, we have to introduce the `CatalystFilter` class hierarchy to bind the Catalyst predicates together with their generated Parquet filter, and complicate the code base a lot. The basic idea of this PR is that, we don't need `findExpression` after filter generation, because we already know a predicate can be pushed down if we can successfully generate its corresponding Parquet filter. SPARK-4213 is fixed by returning `None` for any unsupported predicate type. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3317) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3317 from liancheng/simplify-parquet-filters and squashes the following commits: d6a9499 [Cheng Lian] Fixes import styling issue 43760e8 [Cheng Lian] Simplifies Parquet filter generation logic (cherry picked from commit 36b0956a3eadc7343ed0d25c79a6ce0496eaaccd) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SQL] Construct the MutableRow from an Array	Cheng Hao	2014-11-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Cheng Hao <hao.cheng@intel.com> Closes #3217 from chenghao-intel/mutablerow and squashes the following commits: e8a10bd [Cheng Hao] revert the change of Row object 4681aea [Cheng Hao] Add toMutableRow method in object Row a751838 [Cheng Hao] Construct the MutableRow from an existed row (cherry picked from commit 69e858cc7748b6babadd0cbe20e65f3982161cbf) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4425][SQL] Handle NaN or Infinity cast to Timestamp correctly.	Takuya UESHIN	2014-11-17	2	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	`Cast` from `NaN` or `Infinity` of `Double` or `Float` to `TimestampType` throws `NumberFormatException`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3283 from ueshin/issues/SPARK-4425 and squashes the following commits: 14def0c [Takuya UESHIN] Fix Cast to be able to handle NaN or Infinity to TimestampType. (cherry picked from commit 566c791931645bfaaaf57ee5a15b9ffad534f81e) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-4420][SQL] Change nullability of Cast from DoubleType/FloatType to ↵	Takuya UESHIN	2014-11-17	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DecimalType. This is follow-up of [SPARK-4390](https://issues.apache.org/jira/browse/SPARK-4390) (#3256). Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3278 from ueshin/issues/SPARK-4420 and squashes the following commits: 7fea558 [Takuya UESHIN] Add some tests. cb2301a [Takuya UESHIN] Fix tests. 133bad5 [Takuya UESHIN] Change nullability of Cast from DoubleType/FloatType to DecimalType. (cherry picked from commit 3a81a1c9e0963173534d96850f3c0b7a16350838) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	Preparing development version 1.2.1-SNAPSHOT	Ubuntu	2014-11-17	1	-1/+1
\|
*	Preparing Spark release v1.2.0-snapshot1	Ubuntu	2014-11-17	1	-1/+1
\|