aboutsummaryrefslogtreecommitdiff
path: root/sql/core
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[maven-release-plugin] prepare release v1.1.0-rc3"Patrick Wendell2014-09-021-2/+3
| | | | This reverts commit b2d0493b223c5f98a593bb6d7372706cc02bebad.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-09-021-1/+1
| | | | This reverts commit 865e6f63f63f5e881a02d1a4e3b4c5d0e86fcd8e.
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-301-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc3Patrick Wendell2014-08-301-3/+2
|
* Revert "[maven-release-plugin] prepare release v1.1.0-rc3"Patrick Wendell2014-08-301-2/+3
| | | | This reverts commit 2b2e02265f80e4c5172c1e498aa9ba2c6b91c6c9.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-301-1/+1
| | | | This reverts commit 8b5f0dbd8d32a25a4e7ba3ebe1a4c3c6310aeb85.
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-301-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc3Patrick Wendell2014-08-301-3/+2
|
* [SPARK-3320][SQL] Made batched in-memory column buffer building work for ↵Cheng Lian2014-08-293-34/+39
| | | | | | | | | | | | | | SchemaRDDs with empty partitions Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2213 from liancheng/spark-3320 and squashes the following commits: 45a0139 [Cheng Lian] Fixed typo in InMemoryColumnarQuerySuite f67067d [Cheng Lian] Fixed SPARK-3320 (cherry picked from commit 32b18dd52cf8920903819f23e406271ecd8ac6bb) Signed-off-by: Michael Armbrust <michael@databricks.com>
* Revert "[maven-release-plugin] prepare release v1.1.0-rc2"Patrick Wendell2014-08-291-2/+3
| | | | This reverts commit 711aebb329ca28046396af1e34395a0df92b5327.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-291-1/+1
| | | | This reverts commit a4a7a241441489a0d31365e18476ae2e1c34464d.
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-291-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc2Patrick Wendell2014-08-291-3/+2
|
* Revert "[maven-release-plugin] prepare release v1.1.0-rc1"Patrick Wendell2014-08-281-2/+3
| | | | This reverts commit f07183249b74dd857069028bf7d570b35f265585.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit f8f7a0c9dce764ece8acdc41d35bbf448dba7e92.
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-281-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc1Patrick Wendell2014-08-281-3/+2
|
* Revert "[maven-release-plugin] prepare release v1.1.0-rc1"Patrick Wendell2014-08-281-2/+3
| | | | This reverts commit 58b0be6a29eab817d350729710345e9f39e4c506.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit 78e3c036eee7113b2ed144eec5061e070b479e56.
* Revert "[maven-release-plugin] prepare release v1.1.0-rc1"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit 79e86ef3e1a3ee03a7e3b166a5c7dee11c6d60d7.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit a118ea5c59d653f5a3feda21455ba60bc722b3b1.
* Revert "Revert "[maven-release-plugin] prepare for next development iteration""Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit 71ec0140f7e121bdba3d19e8219e91a5e9d1e320.
* Revert "Revert "[maven-release-plugin] prepare release v1.1.0-rc1""Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit 56070f12f455bae645cba887a74c72b12f1085f8.
* Revert "[maven-release-plugin] prepare release v1.1.0-rc1"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit da4b94c86c9dd0d624b3040aa4b9449be9f60fc3.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit 96926c5a42c5970ed74c50db5bd9c68cacf92207.
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-281-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc1Patrick Wendell2014-08-281-1/+1
|
* Revert "[maven-release-plugin] prepare release v1.1.0-rc1"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit 79e86ef3e1a3ee03a7e3b166a5c7dee11c6d60d7.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-281-1/+1
| | | | This reverts commit a118ea5c59d653f5a3feda21455ba60bc722b3b1.
* [SPARK-3230][SQL] Fix udfs that return structsMichael Armbrust2014-08-282-9/+14
| | | | | | | | | | | | | | | We need to convert the case classes into Rows. Author: Michael Armbrust <michael@databricks.com> Closes #2133 from marmbrus/structUdfs and squashes the following commits: 189722f [Michael Armbrust] Merge remote-tracking branch 'origin/master' into structUdfs 8e29b1c [Michael Armbrust] Use existing function d8d0b76 [Michael Armbrust] Fix udfs that return structs (cherry picked from commit 76e3ba4264c4a0bc2c33ae6ac862fc40bc302d83) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL] Fixed 2 comment typos in SQLConfCheng Lian2014-08-281-3/+4
| | | | | | | | | | | Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2172 from liancheng/sqlconf-typo and squashes the following commits: 115cc71 [Cheng Lian] Fixed 2 comment typos in SQLConf (cherry picked from commit 68f75dcdfe7e8ab229b73824692c4b3d4c39946c) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-281-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc1Patrick Wendell2014-08-281-1/+1
|
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-271-1/+1
|
* [maven-release-plugin] prepare release v1.1.0-rc1Patrick Wendell2014-08-271-3/+2
|
* Revert "[maven-release-plugin] prepare release v1.1.0-snapshot2"Patrick Wendell2014-08-271-2/+3
| | | | This reverts commit e1535ad3c6f7400f2b7915ea91da9c60510557ba.
* Revert "[maven-release-plugin] prepare for next development iteration"Patrick Wendell2014-08-271-1/+1
| | | | This reverts commit 9af3fb7385d1f9f221962f1d2d725ff79bd82033.
* [SPARK-3235][SQL] Ensure in-memory tables don't always broadcast.Michael Armbrust2014-08-274-2/+15
| | | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #2147 from marmbrus/inMemDefaultSize and squashes the following commits: 5390360 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into inMemDefaultSize 14204d3 [Michael Armbrust] Set the context before creating SparkLogicalPlans. 8da4414 [Michael Armbrust] Make sure we throw errors when leaf nodes fail to provide statistcs 18ce029 [Michael Armbrust] Ensure in-memory tables don't always broadcast. (cherry picked from commit 7d2a7a91f263bb9fbf24dc4dbffde8fe5e2c7442) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single ↵chutium2014-08-272-8/+26
| | | | | | | | | | | | | | | | | file as parameter ```if (!fs.getFileStatus(path).isDir) throw Exception``` make no sense after this commit #1370 be careful if someone is working on SPARK-2551, make sure the new change passes test case ```test("Read a parquet file instead of a directory")``` Author: chutium <teng.qiu@gmail.com> Closes #2044 from chutium/parquet-singlefile and squashes the following commits: 4ae477f [chutium] [SPARK-3138][SQL] sqlContext.parquetFile should be able to take a single file as parameter (cherry picked from commit 48f42781dedecd38ddcb2dcf67dead92bb4318f5) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3237][SQL] Fix parquet filters with UDFsMichael Armbrust2014-08-271-2/+6
| | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #2153 from marmbrus/parquetFilters and squashes the following commits: 712731a [Michael Armbrust] Use closure serializer for sending filters. 1e83f80 [Michael Armbrust] Clean udf functions. (cherry picked from commit e1139dd60e0692e8adb1337c1f605165ce4b8895) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3036][SPARK-3037][SQL] Add MapType/ArrayType containing null value ↵Takuya UESHIN2014-08-264-40/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | support to Parquet. JIRA: - https://issues.apache.org/jira/browse/SPARK-3036 - https://issues.apache.org/jira/browse/SPARK-3037 Currently this uses the following Parquet schema for `MapType` when `valueContainsNull` is `true`: ``` message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; optional int32 value; } } } ``` for `ArrayType` when `containsNull` is `true`: ``` message root { optional group a (LIST) { repeated group bag { optional int32 array; } } } ``` We have to think about compatibilities with older version of Spark or Hive or others I mentioned in the JIRA issues. Notice: This PR is based on #1963 and #1889. Please check them first. /cc marmbrus, yhuai Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #2032 from ueshin/issues/SPARK-3036_3037 and squashes the following commits: 4e8e9e7 [Takuya UESHIN] Add ArrayType containing null value support to Parquet. 013c2ca [Takuya UESHIN] Add MapType containing null value support to Parquet. 62989de [Takuya UESHIN] Merge branch 'issues/SPARK-2969' into issues/SPARK-3036_3037 8e38b53 [Takuya UESHIN] Merge branch 'issues/SPARK-3063' into issues/SPARK-3036_3037 (cherry picked from commit 727cb25bcc29481d6b744abef1ca091e64b5f91f) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3194][SQL] Add AttributeSet to fix bugs with invalid comparisons of ↵Michael Armbrust2014-08-265-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AttributeReferences It is common to want to describe sets of attributes that are in various parts of a query plan. However, the semantics of putting `AttributeReference` objects into a standard Scala `Set` result in subtle bugs when references differ cosmetically. For example, with case insensitive resolution it is possible to have two references to the same attribute whose names are not equal. In this PR I introduce a new abstraction, an `AttributeSet`, which performs all comparisons using the globally unique `ExpressionId` instead of case class equality. (There is already a related class, [`AttributeMap`](https://github.com/marmbrus/spark/blob/inMemStats/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala#L32)) This new type of set is used to fix a bug in the optimizer where needed attributes were getting projected away underneath join operators. I also took this opportunity to refactor the expression and query plan base classes. In all but one instance the logic for computing the `references` of an `Expression` were the same. Thus, I moved this logic into the base class. For query plans the semantics of the `references` method were ill defined (is it the references output? or is it those used by expression evaluation? or what?). As a result, this method wasn't really used very much. So, I removed it. TODO: - [x] Finish scala doc for `AttributeSet` - [x] Scan the code for other instances of `Set[Attribute]` and refactor them. - [x] Finish removing `references` from `QueryPlan` Author: Michael Armbrust <michael@databricks.com> Closes #2109 from marmbrus/attributeSets and squashes the following commits: 1c0dae5 [Michael Armbrust] work on serialization bug. 9ba868d [Michael Armbrust] Merge remote-tracking branch 'origin/master' into attributeSets 3ae5288 [Michael Armbrust] review comments 40ce7f6 [Michael Armbrust] style d577cc7 [Michael Armbrust] Scaladoc cae5d22 [Michael Armbrust] remove more references implementations d6e16be [Michael Armbrust] Remove more instances of "def references" and normal sets of attributes. fc26b49 [Michael Armbrust] Add AttributeSet class, remove references from Expression. (cherry picked from commit c4787a3690a9ed3b8b2c6c294fc4a6915436b6f7) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-3063][SQL] ExistingRdd should convert Map to catalyst Map.Takuya UESHIN2014-08-262-1/+48
| | | | | | | | | | | | | | | | Currently `ExistingRdd.convertToCatalyst` doesn't convert `Map` value. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1963 from ueshin/issues/SPARK-3063 and squashes the following commits: 3ba41f2 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-3063 4d7bae2 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-3063 9321379 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-3063 d8a900a [Takuya UESHIN] Make ExistingRdd.convertToCatalyst be able to convert Map value. (cherry picked from commit 6b5584ef1c605cd30f25dbe7099ab32aea1746fb) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2969][SQL] Make ScalaReflection be able to handle ↵Takuya UESHIN2014-08-263-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | ArrayType.containsNull and MapType.valueContainsNull. Make `ScalaReflection` be able to handle like: - `Seq[Int]` as `ArrayType(IntegerType, containsNull = false)` - `Seq[java.lang.Integer]` as `ArrayType(IntegerType, containsNull = true)` - `Map[Int, Long]` as `MapType(IntegerType, LongType, valueContainsNull = false)` - `Map[Int, java.lang.Long]` as `MapType(IntegerType, LongType, valueContainsNull = true)` Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1889 from ueshin/issues/SPARK-2969 and squashes the following commits: 24f1c5c [Takuya UESHIN] Change the default value of ArrayType.containsNull to true in Python API. 79f5b65 [Takuya UESHIN] Change the default value of ArrayType.containsNull to true in Java API. 7cd1a7a [Takuya UESHIN] Fix json test failures. 2cfb862 [Takuya UESHIN] Change the default value of ArrayType.containsNull to true. 2f38e61 [Takuya UESHIN] Revert the default value of MapTypes.valueContainsNull. 9fa02f5 [Takuya UESHIN] Fix a test failure. 1a9a96b [Takuya UESHIN] Modify ScalaReflection to handle ArrayType.containsNull and MapType.valueContainsNull. (cherry picked from commit 98c2bb0bbde6fb2b6f64af3efffefcb0dae94c12) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ↵chutium2014-08-263-5/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | ParquetFile in SQLContext There are 4 different compression codec available for ```ParquetOutputFormat``` in Spark SQL, it was set as a hard-coded value in ```ParquetRelation.defaultCompression``` original discuss: https://github.com/apache/spark/pull/195#discussion-diff-11002083 i added a new config property in SQLConf to allow user to change this compression codec, and i used similar short names syntax as described in SPARK-2953 #1873 (https://github.com/apache/spark/pull/1873/files#diff-0) btw, which codec should we use as default? it was set to GZIP (https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we should change this to SNAPPY, since SNAPPY is already the default codec for shuffling in spark-core (SPARK-2469, #1415), and parquet-mr supports Snappy codec natively (https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632). Author: chutium <teng.qiu@gmail.com> Closes #2039 from chutium/parquet-compression and squashes the following commits: 2f44964 [chutium] [SPARK-3131][SQL] parquet compression default codec set to snappy, also in test suite e578e21 [chutium] [SPARK-3131][SQL] compression codec config property name and default codec set to snappy 21235dc [chutium] [SPARK-3131][SQL] Allow user to set parquet compression codec for writing ParquetFile in SQLContext (cherry picked from commit 8856c3d86009295be871989a5dc7270f31b420cd) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3011][SQL] _temporary directory should be filtered out by ↵Chia-Yung Su2014-08-251-1/+1
| | | | | | | | | | | | | | | | | | | sqlContext.parquetFile fix compile error on hadoop 0.23 for the pull request #1924. Author: Chia-Yung Su <chiayung@appier.com> Closes #1959 from joesu/bugfix-spark3011 and squashes the following commits: be30793 [Chia-Yung Su] remove .* and _* except _metadata 8fe2398 [Chia-Yung Su] add note to explain 40ea9bd [Chia-Yung Su] fix hadoop-0.23 compile error c7e44f2 [Chia-Yung Su] match syntax f8fc32a [Chia-Yung Su] filter out tmp dir (cherry picked from commit 4243bb6634aca5b9ddf6d42778aa7b4866ce6256) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-3058] [SQL] Support EXTENDED for EXPLAINCheng Hao2014-08-253-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide `extended` keyword support for `explain` command in SQL. e.g. ``` explain extended select key as a1, value as a2 from src where key=1; == Parsed Logical Plan == Project ['key AS a1#3,'value AS a2#4] Filter ('key = 1) UnresolvedRelation None, src, None == Analyzed Logical Plan == Project [key#8 AS a1#3,value#9 AS a2#4] Filter (CAST(key#8, DoubleType) = CAST(1, DoubleType)) MetastoreRelation default, src, None == Optimized Logical Plan == Project [key#8 AS a1#3,value#9 AS a2#4] Filter (CAST(key#8, DoubleType) = 1.0) MetastoreRelation default, src, None == Physical Plan == Project [key#8 AS a1#3,value#9 AS a2#4] Filter (CAST(key#8, DoubleType) = 1.0) HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None Code Generation: false == RDD == (2) MappedRDD[14] at map at HiveContext.scala:350 MapPartitionsRDD[13] at mapPartitions at basicOperators.scala:42 MapPartitionsRDD[12] at mapPartitions at basicOperators.scala:57 MapPartitionsRDD[11] at mapPartitions at TableReader.scala:112 MappedRDD[10] at map at TableReader.scala:240 HadoopRDD[9] at HadoopRDD at TableReader.scala:230 ``` It's the sub task of #1847. But can go without any dependency. Author: Cheng Hao <hao.cheng@intel.com> Closes #1962 from chenghao-intel/explain_extended and squashes the following commits: 295db74 [Cheng Hao] Fix bug in printing the simple execution plan 48bc989 [Cheng Hao] Support EXTENDED for EXPLAIN (cherry picked from commit 156eb3966176de02ec3ec90ae10e50a7ebfbbf4f) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2967][SQL] Follow-up: Also copy hash expressions in sort based ↵Michael Armbrust2014-08-231-3/+6
| | | | | | | | | | | | | | | shuffle fix. Follow-up to #2066 Author: Michael Armbrust <michael@databricks.com> Closes #2072 from marmbrus/sortShuffle and squashes the following commits: 2ff8114 [Michael Armbrust] Fix bug (cherry picked from commit 3519b5e8e55b4530d7f7c0bcab254f863dbfa814) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-2554][SQL] CountDistinct partial aggregation and object allocation ↵Michael Armbrust2014-08-238-13/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improvements Author: Michael Armbrust <michael@databricks.com> Author: Gregory Owen <greowen@gmail.com> Closes #1935 from marmbrus/countDistinctPartial and squashes the following commits: 5c7848d [Michael Armbrust] turn off caching in the constructor 8074a80 [Michael Armbrust] fix tests 32d216f [Michael Armbrust] reynolds comments c122cca [Michael Armbrust] Address comments, add tests b2e8ef3 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial fae38f4 [Michael Armbrust] Fix style fdca896 [Michael Armbrust] cleanup 93d0f64 [Michael Armbrust] metastore concurrency fix. db44a30 [Michael Armbrust] JIT hax. 3868f6c [Michael Armbrust] Merge pull request #9 from GregOwen/countDistinctPartial c9e67de [Gregory Owen] Made SpecificRow and types serializable by Kryo 2b46c4b [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 8ff6402 [Michael Armbrust] Add specific row. 58d15f1 [Michael Armbrust] disable codegen logging 87d101d [Michael Armbrust] Fix isNullAt bug abee26d [Michael Armbrust] WIP 27984d0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 57ae3b1 [Michael Armbrust] Fix order dependent test b3d0f64 [Michael Armbrust] Add golden files. c1f7114 [Michael Armbrust] Improve tests / fix serialization. f31b8ad [Michael Armbrust] more fixes 38c7449 [Michael Armbrust] comments and style 9153652 [Michael Armbrust] better toString d494598 [Michael Armbrust] Fix tests now that the planner is better 41fbd1d [Michael Armbrust] Never try and create an empty hash set. 050bb97 [Michael Armbrust] Skip no-arg constructors for kryo, bd08239 [Michael Armbrust] WIP 213ada8 [Michael Armbrust] First draft of partially aggregated and code generated count distinct / max (cherry picked from commit 7e191fe29bb09a8560cd75d453c4f7f662dff406) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [maven-release-plugin] prepare for next development iterationPatrick Wendell2014-08-211-1/+1
|