aboutsummaryrefslogtreecommitdiff
path: root/sql
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-12558][SQL] AnalysisException when multiple functions applied in ↵Dilip Biswal2016-01-122-0/+30
| | | | | | | | | | | | GROUP BY clause cloud-fan Can you please take a look ? In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #10520 from dilipbiswal/spark-12558.
* [SPARK-12788][SQL] Simplify BooleanEquality by using casts.Reynold Xin2016-01-122-26/+32
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10730 from rxin/SPARK-12788.
* [SPARK-12785][SQL] Add ColumnarBatch, an in memory columnar format for ↵Nong Li2016-01-126-0/+1463
| | | | | | | | | | | | | | | | | | | | | | | | | execution. There are many potential benefits of having an efficient in memory columnar format as an alternate to UnsafeRow. This patch introduces ColumnarBatch/ColumnarVector which starts this effort. The remaining implementation can be done as follow up patches. As stated in the in the JIRA, there are useful external components that operate on memory in a simple columnar format. ColumnarBatch would serve that purpose and could server as a zero-serialization/zero-copy exchange for this use case. This patch supports running the underlying data either on heap or off heap. On heap runs a bit faster but we would need offheap for zero-copy exchanges. Currently, this mode is hidden behind one interface (ColumnVector). This differs from Parquet or the existing columnar cache because this is *not* intended to be used as a storage format. The focus is entirely on CPU efficiency as we expect to only have 1 of these batches in memory per task. The layout of the values is just dense arrays of the value type. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10628 from nongli/spark-12635.
* [SPARK-12724] SQL generation support for persisted data source tablesCheng Lian2016-01-1217-51/+55
| | | | | | | | This PR implements SQL generation support for persisted data source tables. A new field `metastoreTableIdentifier: Option[TableIdentifier]` is added to `LogicalRelation`. When a `LogicalRelation` representing a persisted data source relation is created, this field holds the database name and table name of the relation. Author: Cheng Lian <lian@databricks.com> Closes #10712 from liancheng/spark-12724-datasources-sql-gen.
* Revert "[SPARK-12692][BUILD][SQL] Scala style: Fix the style violation ↵Reynold Xin2016-01-1253-149/+140
| | | | | | (Space before "," or ":")" This reverts commit 8cfa218f4f1b05f4d076ec15dd0a033ad3e4500d.
* [SPARK-12768][SQL] Remove CaseKeyWhen expressionReynold Xin2016-01-123-171/+38
| | | | | | | | | | This patch removes CaseKeyWhen expression and replaces it with a factory method that generates the equivalent CaseWhen. This reduces the amount of code we'd need to maintain in the future for both code generation and optimizer. Note that we introduced CaseKeyWhen to avoid duplicate evaluations of the key. This is no longer a problem because we now have common subexpression elimination. Author: Reynold Xin <rxin@databricks.com> Closes #10722 from rxin/SPARK-12768.
* [SPARK-9843][SQL] Make catalyst optimizer pass pluggable at runtimeRobert Kruszewski2016-01-124-2/+46
| | | | | | | | Let me know whether you'd like to see it in other place Author: Robert Kruszewski <robertk@palantir.com> Closes #10210 from robert3005/feature/pluggable-optimizer.
* [SPARK-12762][SQL] Add unit test for SimplifyConditionals optimization ruleReynold Xin2016-01-125-7/+69
| | | | | | | | | | | | | | This pull request does a few small things: 1. Separated if simplification from BooleanSimplification and created a new rule SimplifyConditionals. In the future we can also simplify other conditional expressions here. 2. Added unit test for SimplifyConditionals. 3. Renamed SimplifyCaseConversionExpressionsSuite to SimplifyStringCaseConversionSuite Author: Reynold Xin <rxin@databricks.com> Closes #10716 from rxin/SPARK-12762.
* [SPARK-12692][BUILD][SQL] Scala style: Fix the style violation (Space before ↵Kousuke Saruta2016-01-1253-140/+149
| | | | | | | | | | | "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10718 from sarutak/SPARK-12692-followup-sql.
* [SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancelYin Huai2016-01-111-1/+3
| | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-11823 This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test. Author: Yin Huai <yhuai@databricks.com> Closes #10715 from yhuai/SPARK-11823-ignore.
* [SPARK-12498][SQL][MINOR] BooleanSimplication simplificationCheng Lian2016-01-112-102/+92
| | | | | | | | Scala syntax allows binary case classes to be used as infix operator in pattern matching. This PR makes use of this syntax sugar to make `BooleanSimplification` more readable. Author: Cheng Lian <lian@databricks.com> Closes #10445 from liancheng/boolean-simplification-simplification.
* [SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure ↵wangfei2016-01-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | due to Table already exists exception ``` [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 milliseconds) [info] org.apache.spark.sql.AnalysisException: Table `t1` already exists.; [info] at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296) [info] at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33) [info] at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23) [info] at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) [info] at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745) ``` /cc liancheng Author: wangfei <wangfei_hello@126.com> Closes #10682 from scwf/fix-test.
* [SPARK-12576][SQL] Enable expression parsing in CatalystQlHerman van Hovell2016-01-119-56/+217
| | | | | | | | | | | | The PR allows us to use the new SQL parser to parse SQL expressions such as: ```1 + sin(x*x)``` We enable this functionality in this PR, but we will not start using this actively yet. This will be done as soon as we have reached grammar parity with the existing parser stack. cc rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10649 from hvanhovell/SPARK-12576.
* [SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat ↵Anatoliy Plastinin2016-01-113-3/+20
| | | | | | | | | | | | integers as number of seconds JIRA: https://issues.apache.org/jira/browse/SPARK-12744 This PR makes parsing JSON integers to timestamps consistent with casting behavior. Author: Anatoliy Plastinin <anatoliy.plastinin@gmail.com> Closes #10687 from antlypls/fix-json-timestamp-parsing.
* [SPARK-12539][FOLLOW-UP] always sort in partitioning writerWenchen Fan2016-01-112-147/+48
| | | | | | | | | | | address comments in #10498 , especially https://github.com/apache/spark/pull/10498#discussion_r49021259 Author: Wenchen Fan <wenchen@databricks.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10638 from cloud-fan/bucket-write.
* [SPARK-3873][BUILD] Enable import ordering error checking.Marcelo Vanzin2016-01-1019-30/+31
| | | | | | | | | | | | | Turn import ordering violations into build errors, plus a few adjustments to account for how the checker behaves. I'm a little on the fence about whether the existing code is right, but it's easier to appease the checker than to discuss what's the more correct order here. Plus a few fixes to imports that cropped in since my recent cleanups. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10612 from vanzin/SPARK-3873-enable.
* [SPARK-12340] Fix overflow in various take functions.Reynold Xin2016-01-093-16/+9
| | | | | | | | This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.
* [SPARK-12577] [SQL] Better support of parentheses in partition by and order ↵Liang-Chi Hsieh2016-01-082-11/+32
| | | | | | | | | | by clause of window function's over clause JIRA: https://issues.apache.org/jira/browse/SPARK-12577 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10620 from viirya/fix-parentheses.
* [SPARK-12593][SQL] Converts resolved logical plan back to SQLCheng Lian2016-01-0847-146/+1087
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR tries to enable Spark SQL to convert resolved logical plans back to SQL query strings. For now, the major use case is to canonicalize Spark SQL native view support. The major entry point is `SQLBuilder.toSQL`, which returns an `Option[String]` if the logical plan is recognized. The current version is still in WIP status, and is quite limited. Known limitations include: 1. The logical plan must be analyzed but not optimized The optimizer erases `Subquery` operators, which contain necessary scope information for SQL generation. Future versions should be able to recover erased scope information by inserting subqueries when necessary. 1. The logical plan must be created using HiveQL query string Query plans generated by composing arbitrary DataFrame API combinations are not supported yet. Operators within these query plans need to be rearranged into a canonical form that is more suitable for direct SQL generation. For example, the following query plan ``` Filter (a#1 < 10) +- MetastoreRelation default, src, None ``` need to be canonicalized into the following form before SQL generation: ``` Project [a#1, b#2, c#3] +- Filter (a#1 < 10) +- MetastoreRelation default, src, None ``` Otherwise, the SQL generation process will have to handle a large number of special cases. 1. Only a fraction of expressions and basic logical plan operators are supported in this PR Currently, 95.7% (1720 out of 1798) query plans in `HiveCompatibilitySuite` can be successfully converted to SQL query strings. Known unsupported components are: - Expressions - Part of math expressions - Part of string expressions (buggy?) - Null expressions - Calendar interval literal - Part of date time expressions - Complex type creators - Special `NOT` expressions, e.g. `NOT LIKE` and `NOT IN` - Logical plan operators/patterns - Cube, rollup, and grouping set - Script transformation - Generator - Distinct aggregation patterns that fit `DistinctAggregationRewriter` analysis rule - Window functions Support for window functions, generators, and cubes etc. will be added in follow-up PRs. This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a "round-trip" manner: * For all select queries, we try to convert it back to SQL * If the query plan is convertible, we parse the generated SQL into a new logical plan * Run the new logical plan instead of the original one If the query plan is inconvertible, the test case simply falls back to the original logic. TODO - [x] Fix failed test cases - [x] Support for more basic expressions and logical plan operators (e.g. distinct aggregation etc.) - [x] Comments and documentation Author: Cheng Lian <lian@databricks.com> Closes #10541 from liancheng/sql-generation.
* [SPARK-12687] [SQL] Support from clause surrounded by `()`.Liang-Chi Hsieh2016-01-083-2/+25
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12687 Some queries such as `(select 1 as a) union (select 2 as a)` can't work. This patch fixes it. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10660 from viirya/fix-union.
* [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 editionSean Owen2016-01-086-20/+20
| | | | | | | | Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
* Fix indentation for the previous patch.Reynold Xin2016-01-071-10/+8
|
* [SPARK-12317][SQL] Support units (m,k,g) in SQLConfKevin Yu2016-01-072-1/+60
| | | | | | | | | | | | | | This PR is continue from previous closed PR 10314. In this PR, SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE will be taken memory string conventions as input. For example, the user can now specify 10g for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE in SQLConf file. marmbrus srowen : Can you help review this code changes ? Thanks. Author: Kevin Yu <qyu@us.ibm.com> Closes #10629 from kevinyu98/spark-12317.
* [SPARK-12580][SQL] Remove string concatenations from usage and extended in ↵Kazuaki Ishizaki2016-01-072-25/+25
| | | | | | | | | | | | | | @ExpressionDescription Use multi-line string literals for ExpressionDescription with ``// scalastyle:off line.size.limit`` and ``// scalastyle:on line.size.limit`` The policy is here, as describe at https://github.com/apache/spark/pull/10488 Let's use multi-line string literals. If we have to have a line with more than 100 characters, let's use ``// scalastyle:off line.size.limit`` and ``// scalastyle:on line.size.limit`` to just bypass the line number requirement. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10524 from kiszk/SPARK-12580.
* [MINOR] Fix for BUILD FAILURE for Scala 2.11Jacek Laskowski2016-01-071-18/+1
| | | | | | | | | | It was introduced in 917d3fc069fb9ea1c1487119c9c12b373f4f9b77 /cc cloud-fan rxin Author: Jacek Laskowski <jacek@japila.pl> Closes #10636 from jaceklaskowski/fix-for-build-failure-2.11.
* [SPARK-12662][SQL] Fix DataFrame.randomSplit to avoid creating overlapping ↵Sameer Agarwal2016-01-072-1/+28
| | | | | | | | | | | | splits https://issues.apache.org/jira/browse/SPARK-12662 cc yhuai Author: Sameer Agarwal <sameer@databricks.com> Closes #10626 from sameeragarwal/randomsplit.
* [SPARK-12542][SQL] support except/intersect in HiveQlDavies Liu2016-01-065-5/+65
| | | | | | | | Parse the SQL query with except/intersect in FROM clause for HivQL. Author: Davies Liu <davies@databricks.com> Closes #10622 from davies/intersect.
* [SPARK-12295] [SQL] external spilling for window functionsDavies Liu2016-01-061-86/+228
| | | | | | | | | | This PR manage the memory used by window functions (buffered rows), also enable external spilling. After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G. Author: Davies Liu <davies@databricks.com> Closes #10605 from davies/unsafe_window.
* [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet ↵Nong Li2016-01-061-0/+158
| | | | | | | | | | | | | | | scan benchmarks. [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet scan benchmarks. We've run benchmarks ad hoc to measure the scanner performance. We will continue to invest in this and it makes sense to get these benchmarks into code. This adds a simple benchmarking utility to do this. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10589 from nongli/spark-12640.
* [SPARK-12539][SQL] support writing bucketed tableWenchen Fan2016-01-0618-117/+626
| | | | | | | | | | | | | | | | | | | | | | This PR adds bucket write support to Spark SQL. User can specify bucketing columns, numBuckets and sorting columns with or without partition columns. For example: ``` df.write.partitionBy("year").bucketBy(8, "country").sortBy("amount").saveAsTable("sales") ``` When bucketing is used, we will calculate bucket id for each record, and group the records by bucket id. For each group, we will create a file with bucket id in its name, and write data into it. For each bucket file, if sorting columns are specified, the data will be sorted before write. Note that there may be multiply files for one bucket, as the data is distributed. Currently we store the bucket metadata at hive metastore in a non-hive-compatible way. We use different bucketing hash function compared to hive, so we can't be compatible anyway. Limitations: * Can't write bucketed data without hive metastore. * Can't insert bucketed data into existing hive tables. Author: Wenchen Fan <wenchen@databricks.com> Closes #10498 from cloud-fan/bucket-write.
* [SPARK-12681] [SQL] split IdentifiersParser.g into two filesDavies Liu2016-01-063-516/+566
| | | | | | | | | | To avoid to have a huge Java source (over 64K loc), that can't be compiled. cc hvanhovell Author: Davies Liu <davies@databricks.com> Closes #10624 from davies/split_ident.
* [SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to CatalystHerman van Hovell2016-01-0623-2746/+2041
| | | | | | | | | | | | | | | | | This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made: The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling. The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project: - ```CatalystQl```: This implements Query and Expression parsing functionality. - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe. - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive. cc rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10583 from hvanhovell/SPARK-12575.
* [SPARK-11878][SQL] Eliminate distribute by in case group by is present with ↵Yash Datta2016-01-062-0/+45
| | | | | | | | | | | | | exactly the same grouping expressi For queries like : select <> from table group by a distribute by a we can eliminate distribute by ; since group by will anyways do a hash partitioning Also applicable when user uses Dataframe API Author: Yash Datta <Yash.Datta@guavus.com> Closes #9858 from saucam/eliminatedistribute.
* [SPARK-12340][SQL] fix Int overflow in the SparkPlan.executeTake, RDD.take ↵QiangCai2016-01-062-4/+16
| | | | | | | | | | | | | and AsyncRDDActions.takeAsync I have closed pull request https://github.com/apache/spark/pull/10487. And I create this pull request to resolve the problem. spark jira https://issues.apache.org/jira/browse/SPARK-12340 Author: QiangCai <david.caiq@gmail.com> Closes #10562 from QiangCai/bugfix.
* [SPARK-12578][SQL] Distinct should not be silently ignored when used in an ↵Liang-Chi Hsieh2016-01-062-1/+22
| | | | | | | | | | | | aggregate function with OVER clause JIRA: https://issues.apache.org/jira/browse/SPARK-12578 Slightly update to Hive parser. We should keep the distinct keyword when used in an aggregate function with OVER clause. So the CheckAnalysis will detect it and throw exception later. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10557 from viirya/keep-distinct-hivesql.
* [SPARK-3873][TESTS] Import ordering fixes.Marcelo Vanzin2016-01-05116-223/+203
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
* [SPARK-12504][SQL] Masking credentials in the sql plan explain output for ↵sureshthalamati2016-01-052-0/+27
| | | | | | | | | | | | JDBC data sources. This fix masks JDBC credentials in the explain output. URL patterns to specify credential seems to be vary between different databases. Added a new method to dialect to mask the credentials according to the database specific URL pattern. While adding tests I noticed explain output includes array variable for partitions ([Lorg.apache.spark.Partition;3ff74546,). Modified the code to include the first, and last partition information. Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #10452 from sureshthalamati/mask_jdbc_credentials_spark-12504.
* [SPARK-3873][SQL] Import ordering fixes.Marcelo Vanzin2016-01-05164-318/+301
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10573 from vanzin/SPARK-3873-sql.
* [SPARK-12636] [SQL] Update UnsafeRowParquetRecordReader to support reading ↵Nong2016-01-053-29/+178
| | | | | | | | | | files directly. As noted in the code, this change is to make this component easier to test in isolation. Author: Nong <nongli@gmail.com> Closes #10581 from nongli/spark-12636.
* [SPARK-12439][SQL] Fix toCatalystArray and MapObjectsLiang-Chi Hsieh2016-01-054-6/+14
| | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12439 In toCatalystArray, we should look at the data type returned by dataTypeFor instead of silentSchemaFor, to determine if the element is native type. An obvious problem is when the element is Option[Int] class, catalsilentSchemaFor will return Int, then we will wrongly recognize the element is native type. There is another problem when using Option as array element. When we encode data like Seq(Some(1), Some(2), None) with encoder, we will use MapObjects to construct an array for it later. But in MapObjects, we don't check if the return value of lambdaFunction is null or not. That causes a bug that the decoded data for Seq(Some(1), Some(2), None) would be Seq(1, 2, -1), instead of Seq(1, 2, null). Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10391 from viirya/fix-catalystarray.
* [SPARK-12615] Remove some deprecated APIs in RDD/SparkContextReynold Xin2016-01-051-1/+1
| | | | | | | | I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
* [SPARK-12480][FOLLOW-UP] use a single column vararg for hashWenchen Fan2016-01-053-3/+4
| | | | | | | | | | address comments in #10435 This makes the API easier to use if user programmatically generate the call to hash, and they will get analysis exception if the arguments of hash is empty. Author: Wenchen Fan <wenchen@databricks.com> Closes #10588 from cloud-fan/hash.
* [SPARK-12438][SQL] Add SQLUserDefinedType support for encoderLiang-Chi Hsieh2016-01-053-0/+38
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12438 ScalaReflection lacks the support of SQLUserDefinedType. We should add it. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10390 from viirya/encoder-udt.
* [SPARK-12568][SQL] Add BINARY to EncodersMichael Armbrust2016-01-043-3/+18
| | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #10516 from marmbrus/datasetCleanup.
* [SPARK-12600][SQL] follow up: add range check for DecimalTypeReynold Xin2016-01-041-0/+10
| | | | | | | | This addresses davies' code review feedback in https://github.com/apache/spark/pull/10559 Author: Reynold Xin <rxin@databricks.com> Closes #10586 from rxin/remove-deprecated-sql-followup.
* [SPARK-12480][SQL] add Hash expression that can calculate hash value for a ↵Wenchen Fan2016-01-0410-6/+171
| | | | | | | | | | group of expressions just write the arguments into unsafe row and use murmur3 to calculate hash code Author: Wenchen Fan <wenchen@databricks.com> Closes #10435 from cloud-fan/hash-expr.
* [SPARK-12600][SQL] Remove deprecated methods in Spark SQLReynold Xin2016-01-0418-1062/+123
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10559 from rxin/remove-deprecated-sql.
* [SPARK-12509][SQL] Fixed error messages for DataFrame correlation and covarianceNarine Kokhlikyan2016-01-041-6/+7
| | | | | | | | | | | | | | Currently, when we call corr or cov on dataframe with invalid input we see these error messages for both corr and cov: - "Currently cov supports calculating the covariance between two columns" - "Covariance calculation for columns with dataType "[DataType Name]" not supported." I've fixed this issue by passing the function name as an argument. We could also do the input checks separately for each function. I avoided doing that because of code duplication. Thanks! Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #10458 from NarineK/sparksqlstatsmessages.
* [SPARK-12589][SQL] Fix UnsafeRowParquetRecordReader to properly set the row ↵Nong Li2016-01-043-0/+37
| | | | | | | | | | | | length. The reader was previously not setting the row length meaning it was wrong if there were variable length columns. This problem does not manifest usually, since the value in the column is correct and projecting the row fixes the issue. Author: Nong Li <nong@databricks.com> Closes #10576 from nongli/spark-12589.
* [SPARK-12541] [SQL] support cube/rollup as functionDavies Liu2016-01-048-48/+87
| | | | | | | | | | | This PR enable cube/rollup as function, so they can be used as this: ``` select a, b, sum(c) from t group by rollup(a, b) ``` Author: Davies Liu <davies@databricks.com> Closes #10522 from davies/rollup.