aboutsummaryrefslogtreecommitdiff
path: root/sql
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-2846][SQL] Add configureInputJobPropertiesForStorageHandler to ↵Alex Liu2014-08-201-1/+2
| | | | | | | | | | | | initialization of job conf ...al job conf Author: Alex Liu <alex_liu68@yahoo.com> Closes #1927 from alexliu68/SPARK-SQL-2846 and squashes the following commits: e4bdc4c [Alex Liu] SPARK-SQL-2846 add configureInputJobPropertiesForStorageHandler to initial job conf
* [SPARK-2967][SQL] Fix sort based shuffle for spark sql.Michael Armbrust2014-08-201-7/+23
| | | | | | | | | | Add explicit row copies when sort based shuffle is on. Author: Michael Armbrust <michael@databricks.com> Closes #2066 from marmbrus/sortShuffle and squashes the following commits: fcd7bb2 [Michael Armbrust] Fix sort based shuffle for spark sql.
* [SPARK-3126][SPARK-3127][SQL] Fixed HiveThriftServer2SuiteCheng Lian2014-08-202-15/+5
| | | | | | | | | | | | | | | | This PR fixes two issues: 1. Fixes wrongly quoted command line option in `HiveThriftServer2Suite` that makes test cases hang until timeout. 1. Asks `dev/run-test` to run Spark SQL tests when `bin/spark-sql` and/or `sbin/start-thriftserver.sh` are modified. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #2036 from liancheng/fix-thriftserver-test and squashes the following commits: f38c4eb [Cheng Lian] Fixed the same quotation issue in CliSuite 26b82a0 [Cheng Lian] Run SQL tests when dff contains bin/spark-sql and/or sbin/start-thriftserver.sh a87f83d [Cheng Lian] Extended timeout e5aa31a [Cheng Lian] Fixed metastore JDBC URI quotation
* [SQL] add note of use synchronizedMap in SQLConfwangfei2014-08-191-0/+1
| | | | | | | | | | | | | | | | | | Refer to: http://stackoverflow.com/questions/510632/whats-the-difference-between-concurrenthashmap-and-collections-synchronizedmap Collections.synchronizedMap(map) creates a blocking Map which will degrade performance, albeit ensure consistency. So use ConcurrentHashMap(a more effective thread-safe hashmap) instead. also update HiveQuerySuite to fix test error when changed to ConcurrentHashMap. Author: wangfei <wangfei_hello@126.com> Author: scwf <wangfei1@huawei.com> Closes #1996 from scwf/sqlconf and squashes the following commits: 93bc0c5 [wangfei] revert change of HiveQuerySuite 0cc05dd [wangfei] add note for use synchronizedMap 3c224d31 [scwf] fix formate a7bcb98 [scwf] use ConcurrentHashMap in sql conf, intead synchronizedMap
* [SPARK-2406][SQL] Initial support for using ParquetTableScan to read ↵Michael Armbrust2014-08-187-22/+427
| | | | | | | | | | | | | | | | | | | | | | | | HiveMetaStore tables. This PR adds an experimental flag `spark.sql.hive.convertMetastoreParquet` that when true causes the planner to detects tables that use Hive's Parquet SerDe and instead plans them using Spark SQL's native `ParquetTableScan`. Author: Michael Armbrust <michael@databricks.com> Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1819 from marmbrus/parquetMetastore and squashes the following commits: 1620079 [Michael Armbrust] Revert "remove hive parquet bundle" cc30430 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore 4f3d54f [Michael Armbrust] fix style 41ebc5f [Michael Armbrust] remove hive parquet bundle a43e0da [Michael Armbrust] Merge remote-tracking branch 'origin/master' into parquetMetastore 4c4dc19 [Michael Armbrust] Fix bug with tree splicing. ebb267e [Michael Armbrust] include parquet hive to tests pass (Remove this later). c0d9b72 [Michael Armbrust] Avoid creating a HadoopRDD per partition. Add dirty hacks to retrieve partition values from the InputSplit. 8cdc93c [Michael Armbrust] Merge pull request #8 from yhuai/parquetMetastore a0baec7 [Yin Huai] Partitioning columns can be resolved. 1161338 [Michael Armbrust] Add a test to make sure conversion is actually happening 212d5cd [Michael Armbrust] Initial support for using ParquetTableScan to read HiveMetaStore tables.
* [SPARK-3091] [SQL] Add support for caching metadata on Parquet filesMatei Zaharia2014-08-182-13/+72
| | | | | | | | | | | | | | | | | For larger Parquet files, reading the file footers (which is done in parallel on up to 5 threads) and HDFS block locations (which is serial) can take multiple seconds. We can add an option to cache this data within FilteringParquetInputFormat. Unfortunately ParquetInputFormat only caches footers within each instance of ParquetInputFormat, not across them. Note: this PR leaves this turned off by default for 1.1, but I believe it's safe to turn it on after. The keys in the hash maps are FileStatus objects that include a modification time, so this will work fine if files are modified. The location cache could become invalid if files have moved within HDFS, but that's rare so I just made it invalidate entries every 15 minutes. Author: Matei Zaharia <matei@databricks.com> Closes #2005 from mateiz/parquet-cache and squashes the following commits: dae8efe [Matei Zaharia] Bug fix c71e9ed [Matei Zaharia] Handle empty statuses directly 22072b0 [Matei Zaharia] Use Guava caches and add a config option for caching metadata 8fb56ce [Matei Zaharia] Cache file block locations too 453bd21 [Matei Zaharia] Bug fix 4094df6 [Matei Zaharia] First attempt at caching Parquet footers
* SPARK-3025 [SQL]: Allow JDBC clients to set a fair scheduler poolPatrick Wendell2014-08-182-7/+23
| | | | | | | | | | | This definitely needs review as I am not familiar with this part of Spark. I tested this locally and it did seem to work. Author: Patrick Wendell <pwendell@gmail.com> Closes #1937 from pwendell/scheduler and squashes the following commits: b858e33 [Patrick Wendell] SPARK-3025: Allow JDBC clients to set a fair scheduler pool
* [SPARK-3085] [SQL] Use compact data structures in SQL joinsMatei Zaharia2014-08-181-34/+33
| | | | | | | | | | | | | | | This reuses the CompactBuffer from Spark Core to save memory and pointer dereferences. I also tried AppendOnlyMap instead of java.util.HashMap but unfortunately that slows things down because it seems to do more equals() calls and the equals on GenericRow, and especially JoinedRow, is pretty expensive. Author: Matei Zaharia <matei@databricks.com> Closes #1993 from mateiz/spark-3085 and squashes the following commits: 188221e [Matei Zaharia] Remove unneeded import 5f903ee [Matei Zaharia] [SPARK-3085] [SQL] Use compact data structures in SQL joins
* [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joinsMatei Zaharia2014-08-181-1/+1
| | | | | | | | | | | | | | | BroadcastHashJoin has a broadcastFuture variable that tries to collect the broadcasted table in a separate thread, but this doesn't help because it's a lazy val that only gets initialized when you attempt to build the RDD. Thus queries that broadcast multiple tables would collect and broadcast them sequentially. I changed this to a val to let it start collecting right when the operator is created. Author: Matei Zaharia <matei@databricks.com> Closes #1990 from mateiz/spark-3084 and squashes the following commits: f468766 [Matei Zaharia] [SPARK-3084] Collect broadcasted tables in parallel in joins
* SPARK-3096: Include parquet hive serde by default in buildPatrick Wendell2014-08-181-0/+5
| | | | | | | | | | A small change - we should just add this dependency. It doesn't have any recursive deps and it's needed for reading have parquet tables. Author: Patrick Wendell <pwendell@gmail.com> Closes #2009 from pwendell/parquet and squashes the following commits: e411f9f [Patrick Wendell] SPARk-309: Include parquet hive serde by default in build
* [SQL] Improve debug logging and toStrings.Michael Armbrust2014-08-172-2/+21
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #2004 from marmbrus/codgenDebugging and squashes the following commits: b7a7e41 [Michael Armbrust] Improve debug logging and toStrings.
* Revert "[SPARK-2970] [SQL] spark-sql script ends with IOException when ↵Michael Armbrust2014-08-171-8/+2
| | | | | | | | | | | | EventLogging is enabled" Revert #1891 due to issues with hadoop 1 compatibility. Author: Michael Armbrust <michael@databricks.com> Closes #2007 from marmbrus/revert1891 and squashes the following commits: 68706c0 [Michael Armbrust] Revert "[SPARK-2970] [SQL] spark-sql script ends with IOException when EventLogging is enabled"
* SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" without ↵Sean Owen2014-08-141-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "install" (This is the corrected follow-up to https://issues.apache.org/jira/browse/SPARK-2903) Right now, `mvn compile test-compile` fails to compile Spark. (Don't worry; `mvn package` works, so this is not major.) The issue stems from test code in some modules depending on test code in other modules. That is perfectly fine and supported by Maven. It takes extra work to get this to work with scalatest, and this has been attempted: https://github.com/apache/spark/blob/master/sql/catalyst/pom.xml#L86 This formulation is not quite enough, since the SQL Core module's tests fail to compile for lack of finding test classes in SQL Catalyst, and likewise for most Streaming integration modules depending on core Streaming test code. Example: ``` [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:23: not found: type PlanTest [error] class QueryTest extends PlanTest { [error] ^ [error] /Users/srowen/Documents/spark/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:28: package org.apache.spark.sql.test is not a value [error] test("SPARK-1669: cacheTable should be idempotent") { [error] ^ ... ``` The issue I believe is that generation of a `test-jar` is bound here to the `compile` phase, but the test classes are not being compiled in this phase. It should bind to the `test-compile` phase. It works when executing `mvn package` or `mvn install` since test-jar artifacts are actually generated available through normal Maven mechanisms as each module is built. They are then found normally, regardless of scalatest configuration. It would be nice for a simple `mvn compile test-compile` to work since the test code is perfectly compilable given the Maven declarations. On the plus side, this change is low-risk as it only affects tests. yhuai made the original scalatest change and has glanced at this and thinks it makes sense. Author: Sean Owen <srowen@gmail.com> Closes #1879 from srowen/SPARK-2955 and squashes the following commits: ad8242f [Sean Owen] Generate test-jar on test-compile for modules whose tests are needed by others' tests
* Revert [SPARK-3011][SQL] _temporary directory should be filtered out by ↵Michael Armbrust2014-08-141-2/+1
| | | | | | | | | | | | sqlContext.parquetFile Reverts #1924 due to build failures with hadoop 0.23. Author: Michael Armbrust <michael@databricks.com> Closes #1949 from marmbrus/revert1924 and squashes the following commits: 6bff940 [Michael Armbrust] Revert "[SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile"
* [SPARK-2927][SQL] Add a conf to configure if we always read Binary columns ↵Yin Huai2014-08-145-22/+87
| | | | | | | | | | | | | | | | | | stored in Parquet as String columns This PR adds a new conf flag `spark.sql.parquet.binaryAsString`. When it is `true`, if there is no parquet metadata file available to provide the schema of the data, we will always treat binary fields stored in parquet as string fields. This conf is used to provide a way to read string fields generated without UTF8 decoration. JIRA: https://issues.apache.org/jira/browse/SPARK-2927 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1855 from yhuai/parquetBinaryAsString and squashes the following commits: 689ffa9 [Yin Huai] Add missing "=". 80827de [Yin Huai] Unit test. 1765ca4 [Yin Huai] Use .toBoolean. 9d3f199 [Yin Huai] Merge remote-tracking branch 'upstream/master' into parquetBinaryAsString 5d436a1 [Yin Huai] The initial support of adding a conf to treat binary columns stored in Parquet as string columns.
* [SPARK-3011][SQL] _temporary directory should be filtered out by ↵Chia-Yung Su2014-08-141-1/+2
| | | | | | | | | | | sqlContext.parquetFile Author: Chia-Yung Su <chiayung@appier.com> Closes #1924 from joesu/bugfix-spark3011 and squashes the following commits: c7e44f2 [Chia-Yung Su] match syntax f8fc32a [Chia-Yung Su] filter out tmp dir
* [SPARK-2986] [SQL] fixed: setting properties does not effectguowei2014-08-131-2/+2
| | | | | | | | | | | | | it seems that set command does not run by SparkSQLDriver. it runs on hive api. user can not change reduce number by setting spark.sql.shuffle.partitions but i think setting hive properties seems just a role to spark sql. Author: guowei <guowei@upyoo.com> Closes #1904 from guowei2/temp-branch and squashes the following commits: 7d47dde [guowei] fixed: setting properties like spark.sql.shuffle.partitions does not effective
* [SPARK-2970] [SQL] spark-sql script ends with IOException when EventLogging ↵Kousuke Saruta2014-08-131-2/+8
| | | | | | | | | | | | is enabled Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #1891 from sarutak/SPARK-2970 and squashes the following commits: 4a2d2fe [Kousuke Saruta] Modified comment style 8bd833c [Kousuke Saruta] Modified style 6c0997c [Kousuke Saruta] Modified the timing of shutdown hook execution. It should be executed before shutdown hook of o.a.h.f.FileSystem
* [SPARK-2935][SQL]Fix parquet predicate push down bugMichael Armbrust2014-08-133-3/+10
| | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1863 from marmbrus/parquetPredicates and squashes the following commits: 10ad202 [Michael Armbrust] left <=> right f249158 [Michael Armbrust] quiet parquet tests. 802da5b [Michael Armbrust] Add test case. eab2eda [Michael Armbrust] Fix parquet predicate push down bug
* [SPARK-2650][SQL] More precise initial buffer size estimation for in-memory ↵Cheng Lian2014-08-131-5/+6
| | | | | | | | | | | | | | column buffer This is a follow up of #1880. Since the row number within a single batch is known, we can estimate a much more precise initial buffer size when building an in-memory column buffer. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1901 from liancheng/precise-init-buffer-size and squashes the following commits: d5501fa [Cheng Lian] More precise initial buffer size estimation for in-memory column buffer
* [SPARK-2994][SQL] Support for udfs that take complex typesMichael Armbrust2014-08-132-18/+37
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1915 from marmbrus/arrayUDF and squashes the following commits: a1c503d [Michael Armbrust] Support for udfs that take complex types
* [SPARK-2817] [SQL] add "show create table" supporttianyi2014-08-1337-0/+199
| | | | | | | | | | | | | | | | | | | | | | | | | In spark sql component, the "show create table" syntax had been disabled. We thought it is a useful funciton to describe a hive table. Author: tianyi <tianyi@asiainfo-linkage.com> Author: tianyi <tianyi@asiainfo.com> Author: tianyi <tianyi.asiainfo@gmail.com> Closes #1760 from tianyi/spark-2817 and squashes the following commits: 7d28b15 [tianyi] [SPARK-2817] fix too short prefix problem cbffe8b [tianyi] [SPARK-2817] fix the case problem 565ec14 [tianyi] [SPARK-2817] fix the case problem 60d48a9 [tianyi] [SPARK-2817] use system temporary folder instead of temporary files in the source tree, and also clean some empty line dbe1031 [tianyi] [SPARK-2817] move some code out of function rewritePaths, as it may be called multiple times 9b2ba11 [tianyi] [SPARK-2817] fix the line length problem 9f97586 [tianyi] [SPARK-2817] remove test.tmp.dir from pom.xml bfc2999 [tianyi] [SPARK-2817] add "File.separator" support, create a "testTmpDir" outside the rewritePaths bde800a [tianyi] [SPARK-2817] add "${system:test.tmp.dir}" support add "last_modified_by" to nonDeterministicLineIndicators in HiveComparisonTest bb82726 [tianyi] [SPARK-2817] remove test which requires a system from the whitelist. bbf6b42 [tianyi] [SPARK-2817] add a systemProperties named "test.tmp.dir" to pass the test which contains "${system:test.tmp.dir}" a337bd6 [tianyi] [SPARK-2817] add "show create table" support a03db77 [tianyi] [SPARK-2817] add "show create table" support
* [SPARK-3004][SQL] Added null checking when retrieving row setCheng Lian2014-08-133-33/+96
| | | | | | | | | | | | | | | JIRA issue: [SPARK-3004](https://issues.apache.org/jira/browse/SPARK-3004) HiveThriftServer2 throws exception when the result set contains `NULL`. Should check `isNullAt` in `SparkSQLOperationManager.getNextRowSet`. Note that simply using `row.addColumnValue(null)` doesn't work, since Hive set the column type of a null `ColumnValue` to String by default. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1920 from liancheng/spark-3004 and squashes the following commits: 1b1db1c [Cheng Lian] Adding NULL column values in the Hive way 2217722 [Cheng Lian] Fixed SPARK-3004: added null checking when retrieving row set
* [SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for ↵Cheng Hao2014-08-111-26/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HashOuterJoin This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in #1147 Author: Cheng Hao <hao.cheng@intel.com> Closes #1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap
* [SPARK-2650][SQL] Build column buffers in smaller batchesMichael Armbrust2014-08-117-36/+70
| | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1880 from marmbrus/columnBatches and squashes the following commits: 0649987 [Michael Armbrust] add test 4756fad [Michael Armbrust] fix compilation 2314532 [Michael Armbrust] Build column buffers in smaller batches
* [SPARK-2968][SQL] Fix nullabilities of Explode.Takuya UESHIN2014-08-111-4/+4
| | | | | | | | | | Output nullabilities of `Explode` could be detemined by `ArrayType.containsNull` or `MapType.valueContainsNull`. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1888 from ueshin/issues/SPARK-2968 and squashes the following commits: d128c95 [Takuya UESHIN] Fix nullability of Explode.
* [SPARK-2965][SQL] Fix HashOuterJoin output nullabilities.Takuya UESHIN2014-08-111-1/+12
| | | | | | | | | | Output attributes of opposite side of `OuterJoin` should be nullable. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #1887 from ueshin/issues/SPARK-2965 and squashes the following commits: bcb2d37 [Takuya UESHIN] Fix HashOuterJoin output nullabilities.
* [SQL] A tiny refactoring in HiveContext#analyzeYin Huai2014-08-111-5/+3
| | | | | | | | | | I should use `EliminateAnalysisOperators` in `analyze` instead of manually pattern matching. Author: Yin Huai <huaiyin.thu@gmail.com> Closes #1881 from yhuai/useEliminateAnalysisOperators and squashes the following commits: f3e1e7f [Yin Huai] Use EliminateAnalysisOperators.
* [sql]use SparkSQLEnv.stop() in ShutdownHookwangfei2014-08-111-1/+1
| | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #1852 from scwf/patch-3 and squashes the following commits: ae28c29 [wangfei] use SparkSQLEnv.stop() in ShutdownHook
* [SPARK-2590][SQL] Added option to handle incremental collection, disabled by ↵Cheng Lian2014-08-111-1/+10
| | | | | | | | | | | | | | default JIRA issue: [SPARK-2590](https://issues.apache.org/jira/browse/SPARK-2590) Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1853 from liancheng/inc-collect-option and squashes the following commits: cb3ea45 [Cheng Lian] Moved incremental collection option to Thrift server 43ce3aa [Cheng Lian] Changed incremental collect option name 623abde [Cheng Lian] Added option to handle incremental collection, disabled by default
* Updated Spark SQL README to include the hive-thriftserver moduleReynold Xin2014-08-091-1/+2
| | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #1867 from rxin/sql-readme and squashes the following commits: 42a5307 [Reynold Xin] Updated Spark SQL README to include the hive-thriftserver module
* [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be ↵chutium2014-08-081-2/+3
| | | | | | | | | | | filtered out by sqlContext.parquetFile Author: chutium <teng.qiu@gmail.com> Closes #1691 from chutium/SPARK-2700 and squashes the following commits: b76ae8c [chutium] [SPARK-2700] [SQL] fixed styling issue d75a8bd [chutium] [SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
* [SPARK-2919] [SQL] Basic support for analyze command in HiveQlYin Huai2014-08-084-5/+89
| | | | | | | | | | | | | | | | | The command we will support is ``` ANALYZE TABLE tablename COMPUTE STATISTICS noscan ``` Other cases shown in https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables will still be treated as Hive native commands. JIRA: https://issues.apache.org/jira/browse/SPARK-2919 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1848 from yhuai/sqlAnalyze and squashes the following commits: 0b79d36 [Yin Huai] Typo and format. c59d94b [Yin Huai] Support "ANALYZE TABLE tableName COMPUTE STATISTICS noscan".
* [SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when ↵Yin Huai2014-08-081-3/+5
| | | | | | | | | | | | creating the tableDesc JIRA: https://issues.apache.org/jira/browse/SPARK-2877 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1806 from yhuai/SPARK-2877 and squashes the following commits: 4142bcb [Yin Huai] Use Spark's classloader.
* [SPARK-2908] [SQL] JsonRDD.nullTypeToStringType does not convert all ↵Yin Huai2014-08-083-5/+12
| | | | | | | | | | | | | NullType to StringType JIRA: https://issues.apache.org/jira/browse/SPARK-2908 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1840 from yhuai/SPARK-2908 and squashes the following commits: 86e833e [Yin Huai] Update test. cb11759 [Yin Huai] nullTypeToStringType should check columns with the type of array of structs.
* [SPARK-2888] [SQL] Fix addColumnMetadataToConf in HiveTableScanYin Huai2014-08-081-11/+6
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-2888 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1817 from yhuai/fixAddColumnMetadataToConf and squashes the following commits: fba728c [Yin Huai] Fix addColumnMetadataToConf.
* [SPARK-2678][Core][SQL] A workaround for SPARK-2678Cheng Lian2014-08-063-20/+23
| | | | | | | | | | | | | | | | | | | | | | | JIRA issues: - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678) - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874) Related PR: - #1715 This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1801 from liancheng/spark-2874 and squashes the following commits: 8045d7a [Cheng Lian] Make sure test suites pass 8493a9e [Cheng Lian] Using eval to retain quoted arguments aed523f [Cheng Lian] Fixed typo in bin/spark-sql f12a0b1 [Cheng Lian] Worked arount SPARK-2678 daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
* [SPARK-2875] [PySpark] [SQL] handle null in schemaRDD()Davies Liu2014-08-061-11/+16
| | | | | | | | | | Handle null in schemaRDD during converting them into Python. Author: Davies Liu <davies.liu@gmail.com> Closes #1802 from davies/json and squashes the following commits: 88e6b1f [Davies Liu] handle null in schemaRDD()
* [SQL] Fix logging warn -> debugMichael Armbrust2014-08-051-2/+2
| | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1800 from marmbrus/warning and squashes the following commits: 8ea9cf1 [Michael Armbrust] [SQL] Fix logging warn -> debug.
* [SQL] Tighten the visibility of various SQLConf methods and renamed ↵Reynold Xin2014-08-056-66/+62
| | | | | | | | | | | setter/getters Author: Reynold Xin <rxin@apache.org> Closes #1794 from rxin/sql-conf and squashes the following commits: 3ac11ef [Reynold Xin] getAllConfs return an immutable Map instead of an Array. 4b19d6c [Reynold Xin] Tighten the visibility of various SQLConf methods and renamed setter/getters.
* [SPARK-2866][SQL] Support attributes in ORDER BY that aren't in SELECTMichael Armbrust2014-08-053-7/+116
| | | | | | | | | | | | | Minor refactoring to allow resolution either using a nodes input or output. Author: Michael Armbrust <michael@databricks.com> Closes #1795 from marmbrus/ordering and squashes the following commits: 237f580 [Michael Armbrust] style 74d833b [Michael Armbrust] newline 705d963 [Michael Armbrust] Add a rule for resolving ORDER BY expressions that reference attributes not present in the SELECT clause. 82cabda [Michael Armbrust] Generalize attribute resolution.
* [SPARK-2854][SQL] Finalize _acceptable_types in pyspark.sqlYin Huai2014-08-051-0/+3
| | | | | | | | | | | | | This PR aims to finalize accepted data value types in Python RDDs provided to Python `applySchema`. JIRA: https://issues.apache.org/jira/browse/SPARK-2854 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1793 from yhuai/SPARK-2854 and squashes the following commits: 32f0708 [Yin Huai] LongType only accepts long values. c2b23dd [Yin Huai] Do data type conversions based on the specified Spark SQL data type.
* [SPARK-2650][SQL] Try to partially fix SPARK-2650 by adjusting initial ↵Cheng Lian2014-08-055-30/+20
| | | | | | | | | | | | | | | | | | | buffer size and reducing memory allocation JIRA issue: [SPARK-2650](https://issues.apache.org/jira/browse/SPARK-2650) Please refer to [comments](https://issues.apache.org/jira/browse/SPARK-2650?focusedCommentId=14084397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14084397) of SPARK-2650 for some other details. This PR adjusts the initial in-memory columnar buffer size to 1MB, same as the default value of Shark's `shark.column.partitionSize.mb` property when running in local mode. Will add Shark style partition size estimation in another PR. Also, before this PR, `NullableColumnBuilder` copies the whole buffer to add the null positions section, and then `CompressibleColumnBuilder` copies and compresses the buffer again, even if compression is disabled (`PassThrough` compression scheme is used to disable compression). In this PR the first buffer copy is eliminated to reduce memory consumption. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1769 from liancheng/spark-2650 and squashes the following commits: 88a042e [Cheng Lian] Fixed method visibility and removed dead code 001f2e5 [Cheng Lian] Try fixing SPARK-2650 by adjusting initial buffer size and reducing memory allocation
* [sql] rename project name in pom.xml of hive-thriftserver modulewangfei2014-08-051-1/+1
| | | | | | | | | | module spark-hive-thriftserver_2.10 and spark-hive_2.10 both named "Spark Project Hive" in pom.xml, so rename spark-hive-thriftserver_2.10 project name to "Spark Project Hive Thrift Server" Author: wangfei <wangfei1@huawei.com> Closes #1789 from scwf/patch-1 and squashes the following commits: ca1f5e9 [wangfei] [sql] rename module name of hive-thriftserver
* [SPARK-2860][SQL] Fix coercion of CASE WHEN.Michael Armbrust2014-08-053-24/+36
| | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #1785 from marmbrus/caseNull and squashes the following commits: 126006d [Michael Armbrust] better error message 2fe357f [Michael Armbrust] Fix coercion of CASE WHEN.
* [SPARK-2783][SQL] Basic support for analyze in HiveContextYin Huai2014-08-033-2/+136
| | | | | | | | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-2783 Author: Yin Huai <huai@cse.ohio-state.edu> Closes #1741 from yhuai/analyzeTable and squashes the following commits: 7bb5f02 [Yin Huai] Use sql instead of hql. 4d09325 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable e3ebcd4 [Yin Huai] Renaming. c170f4e [Yin Huai] Do not use getContentSummary. 62393b6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable db233a6 [Yin Huai] Trying to debug jenkins... fee84f0 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable f0501f3 [Yin Huai] Fix compilation error. 24ad391 [Yin Huai] Merge remote-tracking branch 'upstream/master' into analyzeTable 8918140 [Yin Huai] Wording. 23df227 [Yin Huai] Add a simple analyze method to get the size of a table and update the "totalSize" property of this table in the Hive metastore.
* [SPARK-2814][SQL] HiveThriftServer2 throws NPE when executing native commandsCheng Lian2014-08-031-4/+3
| | | | | | | | | | JIRA issue: [SPARK-2814](https://issues.apache.org/jira/browse/SPARK-2814) Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1753 from liancheng/spark-2814 and squashes the following commits: c74a3b2 [Cheng Lian] Fixed SPARK-2814
* [SPARK-2784][SQL] Deprecate hql() method in favor of a config option, ↵Michael Armbrust2014-08-0317-106/+168
| | | | | | | | | | | | | | | | | | | | 'spark.sql.dialect' Many users have reported being confused by the distinction between the `sql` and `hql` methods. Specifically, many users think that `sql(...)` cannot be used to read hive tables. In this PR I introduce a new configuration option `spark.sql.dialect` that picks which dialect with be used for parsing. For SQLContext this must be set to `sql`. In `HiveContext` it defaults to `hiveql` but can also be set to `sql`. The `hql` and `hiveql` methods continue to act the same but are now marked as deprecated. **This is a possibly breaking change for some users unless they set the dialect manually, though this is unlikely.** For example: `hiveContex.sql("SELECT 1")` will now throw a parsing exception by default. Author: Michael Armbrust <michael@databricks.com> Closes #1746 from marmbrus/sqlLanguageConf and squashes the following commits: ad375cc [Michael Armbrust] Merge remote-tracking branch 'apache/master' into sqlLanguageConf 20c43f8 [Michael Armbrust] override function instead of just setting the value 7e4ae93 [Michael Armbrust] Deprecate hql() method in favor of a config option, 'spark.sql.dialect'
* [SPARK-2739][SQL] Rename registerAsTable to registerTempTableMichael Armbrust2014-08-0219-75/+78
| | | | | | | | | | | | | There have been user complaints that the difference between `registerAsTable` and `saveAsTable` is too subtle. This PR addresses this by renaming `registerAsTable` to `registerTempTable`, which more clearly reflects what is happening. `registerAsTable` remains, but will cause a deprecation warning. Author: Michael Armbrust <michael@databricks.com> Closes #1743 from marmbrus/registerTempTable and squashes the following commits: d031348 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable 4dff086 [Michael Armbrust] Fix .java files too 89a2f12 [Michael Armbrust] Merge remote-tracking branch 'apache/master' into registerTempTable 0b7b71e [Michael Armbrust] Rename registerAsTable to registerTempTable
* [SPARK-2729][SQL] Added test case for SPARK-2729Cheng Lian2014-08-022-2/+22
| | | | | | | | | | This is a follow up of #1636. Author: Cheng Lian <lian.cs.zju@gmail.com> Closes #1738 from liancheng/test-for-spark-2729 and squashes the following commits: b13692a [Cheng Lian] Added test case for SPARK-2729