aboutsummaryrefslogtreecommitdiff
path: root/sql/hive-thriftserver/src/main
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-11089][SQL] Adds option for disabling multi-session in Thrift serverCheng Lian2015-11-171-1/+5
| | | | | | | | | | This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server. Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one. Author: Cheng Lian <lian@databricks.com> Closes #9740 from liancheng/spark-11089.single-session-option.
* [SPARK-11546] Thrift server makes too many logs about result schemanavis.ryu2015-11-061-11/+13
| | | | | | | | SparkExecuteStatementOperation logs result schema for each getNextRowSet() calls which is by default every 1000 rows, overwhelming whole log file. Author: navis.ryu <navis@apache.org> Closes #9514 from navis/SPARK-11546.
* [SPARK-10786][SQL] Take the whole statement to generate the CommandProcessorhuangzhaowei2015-11-021-1/+1
| | | | | | | | | | | | | | | | | | | In the now implementation of `SparkSQLCLIDriver.scala`: `val proc: CommandProcessor = CommandProcessorFactory.get(Array(tokens(0)), hconf)` `CommandProcessorFactory` only take the first token of the statement, and this will be hard to diff the statement `delete jar xxx` and `delete from xxx`. So maybe it's better to take the whole statement into the `CommandProcessorFactory`. And in [HiveCommand](https://github.com/SaintBacchus/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java#L76), it already special handing these two statement. ```java if(command.length > 1 && "from".equalsIgnoreCase(command[1])) { //special handling for SQL "delete from <table> where..." return null; } ``` Author: huangzhaowei <carlmartinmax@gmail.com> Closes #8895 from SaintBacchus/SPARK-10786.
* [SPARK-11188][SQL] Elide stacktraces in bin/spark-sql for AnalysisExceptionsDilip Biswal2015-10-292-4/+17
| | | | | | | | Only print the error message to the console for Analysis Exceptions in sql-shell. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #9194 from dilipbiswal/spark-11188.
* [SPARK-10810] [SPARK-10902] [SQL] Improve session management in SQLDavies Liu2015-10-083-71/+19
| | | | | | | | | | | | | | | | | | | | | This PR improve the sessions management by replacing the thread-local based to one SQLContext per session approach, introduce separated temporary tables and UDFs/UDAFs for each session. A new session of SQLContext could be created by: 1) create an new SQLContext 2) call newSession() on existing SQLContext For HiveContext, in order to reduce the cost for each session, the classloader and Hive client are shared across multiple sessions (created by newSession). CacheManager is also shared by multiple sessions, so cache a table multiple times in different sessions will not cause multiple copies of in-memory cache. Added jars are still shared by all the sessions, because SparkContext does not support sessions. cc marmbrus yhuai rxin Author: Davies Liu <davies@databricks.com> Closes #8909 from davies/sessions.
* [SPARK-10845] [SQL] Makes spark.sql.hive.version a SQLConfEntryCheng Lian2015-09-261-1/+0
| | | | | | | | | | When refactoring SQL options from plain strings to the strongly typed `SQLConfEntry`, `spark.sql.hive.version` wasn't migrated, and doesn't show up in the result of `SET -v`, as `SET -v` only shows public `SQLConfEntry` instances. This affects compatibility with Simba ODBC driver. This PR migrates this SQL option as a `SQLConfEntry` to fix this issue. Author: Cheng Lian <lian@databricks.com> Closes #8925 from liancheng/spark-10845/hive-version-conf.
* [SPARK-9522] [SQL] SparkSubmit process can not exit if kill application when ↵linweizhong2015-09-171-0/+6
| | | | | | | | | | HiveThriftServer was starting When we start HiveThriftServer, we will start SparkContext first, then start HiveServer2, if we kill application while HiveServer2 is starting then SparkContext will stop successfully, but SparkSubmit process can not exit. Author: linweizhong <linweizhong@huawei.com> Closes #7853 from Sephiroth-Lin/SPARK-9522.
* [SPARK-10226] [SQL] Fix exclamation mark issue in SparkSQLwangwei2015-08-291-0/+1
| | | | | | | | When I tested the latest version of spark with exclamation mark, I got some errors. Then I reseted the spark version and found that commit id "a2409d1c8e8ddec04b529ac6f6a12b5993f0eeda" brought the bug. With jline version changing from 0.9.94 to 2.12 after this commit, exclamation mark would be treated as a special character in ConsoleReader. Author: wangwei <wangwei82@huawei.com> Closes #8420 from small-wang/jline-SPARK-10226.
* [SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses ↵Sean Owen2015-08-255-25/+25
| | | | | | | | | | | | to JavaConverters Replace `JavaConversions` implicits with `JavaConverters` Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet. Author: Sean Owen <sowen@cloudera.com> Closes #8033 from srowen/SPARK-9613.
* [SPARK-10121] [SQL] Thrift server always use the latest class loader ↵Yin Huai2015-08-251-0/+6
| | | | | | | | | | | | provided by the conf of executionHive's state https://issues.apache.org/jira/browse/SPARK-10121 Looks like the problem is that if we add a jar through another thread, the thread handling the JDBC session will not get the latest classloader. Author: Yin Huai <yhuai@databricks.com> Closes #8368 from yhuai/SPARK-10121.
* [SPARK-9958] [SQL] Make HiveThriftServer2Listener thread-safe and update the ↵zsxwing2015-08-144-60/+78
| | | | | | | | | | | | tab name to "JDBC/ODBC Server" This PR fixed the thread-safe issue of HiveThriftServer2Listener, and also changed the tab name to "JDBC/ODBC Server" since it's conflict with the new SQL tab. <img width="1377" alt="thriftserver" src="https://cloud.githubusercontent.com/assets/1000778/9265707/c46f3f2c-4269-11e5-8d7e-888c9113ab4f.png"> Author: zsxwing <zsxwing@gmail.com> Closes #8185 from zsxwing/SPARK-9958.
* [SPARK-9826] [CORE] Fix cannot use custom classes in log4j.propertiesMichel Lemay2015-08-122-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor Utils class and create ShutdownHookManager. NOTE: Wasn't able to run /dev/run-tests on windows machine. Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark) ex: log4j.rootCategory=WARN,console,redis log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.spark.graphx.Pregel=INFO log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender log4j.appender.redis.endpoints=hostname:port log4j.appender.redis.key=mykey log4j.appender.redis.alwaysBatch=false log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1 Author: michellemay <mlemay@gmail.com> Closes #8109 from michellemay/SPARK-9826.
* [SPARK-8064] [SQL] Build against Hive 1.2.1Steve Loughran2015-08-036-38/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cherry picked the parts of the initial SPARK-8064 WiP branch needed to get sql/hive to compile against hive 1.2.1. That's the ASF release packaged under org.apache.hive, not any fork. Tests not run yet: that's what the machines are for Author: Steve Loughran <stevel@hortonworks.com> Author: Cheng Lian <lian@databricks.com> Author: Michael Armbrust <michael@databricks.com> Author: Patrick Wendell <patrick@databricks.com> Closes #7191 from steveloughran/stevel/feature/SPARK-8064-hive-1.2-002 and squashes the following commits: 7556d85 [Cheng Lian] Updates .q files and corresponding golden files ef4af62 [Steve Loughran] Merge commit '6a92bb09f46a04d6cd8c41bdba3ecb727ebb9030' into stevel/feature/SPARK-8064-hive-1.2-002 6a92bb0 [Cheng Lian] Overrides HiveConf time vars dcbb391 [Cheng Lian] Adds com.twitter:parquet-hadoop-bundle:1.6.0 for Hive Parquet SerDe 0bbe475 [Steve Loughran] SPARK-8064 scalastyle rejects the standard Hadoop ASF license header... fdf759b [Steve Loughran] SPARK-8064 classpath dependency suite to be in sync with shading in final (?) hive-exec spark 7a6c727 [Steve Loughran] SPARK-8064 switch to second staging repo of the spark-hive artifacts. This one has the protobuf-shaded hive-exec jar 376c003 [Steve Loughran] SPARK-8064 purge duplicate protobuf declaration 2c74697 [Steve Loughran] SPARK-8064 switch to the protobuf shaded hive-exec jar with tests to chase it down cc44020 [Steve Loughran] SPARK-8064 remove hadoop.version from runtest.py, as profile will fix that automatically. 6901fa9 [Steve Loughran] SPARK-8064 explicit protobuf import da310dc [Michael Armbrust] Fixes for Hive tests. a775a75 [Steve Loughran] SPARK-8064 cherry-pick-incomplete 7404f34 [Patrick Wendell] Add spark-hive staging repo 832c164 [Steve Loughran] SPARK-8064 try to supress compiler warnings on Complex.java pasted-thrift-code 312c0d4 [Steve Loughran] SPARK-8064 maven/ivy dependency purge; calcite declaration needed fa5ae7b [Steve Loughran] HIVE-8064 fix up hive-thriftserver dependencies and cut back on evicted references in the hive- packages; this keeps mvn and ivy resolution compatible, as the reconciliation policy is "by hand" c188048 [Steve Loughran] SPARK-8064 manage the Hive depencencies to that -things that aren't needed are excluded -sql/hive built with ivy is in sync with the maven reconciliation policy, rather than latest-first 4c8be8d [Cheng Lian] WIP: Partial fix for Thrift server and CLI tests 314eb3c [Steve Loughran] SPARK-8064 deprecation warning noise in one of the tests 17b0341 [Steve Loughran] SPARK-8064 IDE-hinted cleanups of Complex.java to reduce compiler warnings. It's all autogenerated code, so still ugly. d029b92 [Steve Loughran] SPARK-8064 rely on unescaping to have already taken place, so go straight to map of serde options 23eca7e [Steve Loughran] HIVE-8064 handle raw and escaped property tokens 54d9b06 [Steve Loughran] SPARK-8064 fix compilation regression surfacing from rebase 0b12d5f [Steve Loughran] HIVE-8064 use subset of hive complex type whose types deserialize fce73b6 [Steve Loughran] SPARK-8064 poms rely implicitly on the version of kryo chill provides fd3aa5d [Steve Loughran] SPARK-8064 version of hive to d/l from ivy is 1.2.1 dc73ece [Steve Loughran] SPARK-8064 revert to master's determinstic pushdown strategy d3c1e4a [Steve Loughran] SPARK-8064 purge UnionType 051cc21 [Steve Loughran] SPARK-8064 switch to an unshaded version of hive-exec-core, which must have been built with Kryo 2.21. This currently looks for a (locally built) version 1.2.1.spark 6684c60 [Steve Loughran] SPARK-8064 ignore RTE raised in blocking process.exitValue() call e6121e5 [Steve Loughran] SPARK-8064 address review comments aa43dc6 [Steve Loughran] SPARK-8064 more robust teardown on JavaMetastoreDatasourcesSuite f2bff01 [Steve Loughran] SPARK-8064 better takeup of asynchronously caught error text 8b1ef38 [Steve Loughran] SPARK-8064: on failures executing spark-submit in HiveSparkSubmitSuite, print command line and all logged output. 5a9ce6b [Steve Loughran] SPARK-8064 add explicit reason for kv split failure, rather than array OOB. *does not address the issue* 642b63a [Steve Loughran] SPARK-8064 reinstate something cut briefly during rebasing 97194dc [Steve Loughran] SPARK-8064 add extra logging to the YarnClusterSuite classpath test. There should be no reason why this is failing on jenkins, but as it is (and presumably its CP-related), improve the logging including any exception raised. 335357f [Steve Loughran] SPARK-8064 fail fast on thrive process spawning tests on exit codes and/or error string patterns seen in log. 3ed872f [Steve Loughran] SPARK-8064 rename field double to dbl bca55e5 [Steve Loughran] SPARK-8064 missed one of the `date` escapes 41d6479 [Steve Loughran] SPARK-8064 wrap tests with withTable() calls to avoid table-exists exceptions 2bc29a4 [Steve Loughran] SPARK-8064 ParquetSuites to escape `date` field name 1ab9bc4 [Steve Loughran] SPARK-8064 TestHive to use sered2.thrift.test.Complex bf3a249 [Steve Loughran] SPARK-8064: more resubmit than fix; tighten startup timeout to 60s. Still no obvious reason why jersey server code in spark-assembly isn't being picked up -it hasn't been shaded c829b8f [Steve Loughran] SPARK-8064: reinstate yarn-rm-server dependencies to hive-exec to ensure that jersey server is on classpath on hadoop versions < 2.6 0b0f738 [Steve Loughran] SPARK-8064: thrift server startup to fail fast on any exception in the main thread 13abaf1 [Steve Loughran] SPARK-8064 Hive compatibilty tests sin sync with explain/show output from Hive 1.2.1 d14d5ea [Steve Loughran] SPARK-8064: DATE is now a predicate; you can't use it as a field in select ops 26eef1c [Steve Loughran] SPARK-8064: HIVE-9039 renamed TOK_UNION => TOK_UNIONALL while adding TOK_UNIONDISTINCT 3d64523 [Steve Loughran] SPARK-8064 improve diagns on uknown token; fix scalastyle failure d0360f6 [Steve Loughran] SPARK-8064: delicate merge in of the branch vanzin/hive-1.1 1126e5a [Steve Loughran] SPARK-8064: name of unrecognized file format wasn't appearing in error text 8cb09c4 [Steve Loughran] SPARK-8064: test resilience/assertion improvements. Independent of the rest of the work; can be backported to earlier versions dec12cb [Steve Loughran] SPARK-8064: when a CLI suite test fails include the full output text in the raised exception; this ensures that the stdout/stderr is included in jenkins reports, so it becomes possible to diagnose the cause. 463a670 [Steve Loughran] SPARK-8064 run-tests.py adds a hadoop-2.6 profile, and changes info messages to say "w/Hive 1.2.1" in console output 2531099 [Steve Loughran] SPARK-8064 successful attempt to get rid of pentaho as a transitive dependency of hive-exec 1d59100 [Steve Loughran] SPARK-8064 (unsuccessful) attempt to get rid of pentaho as a transitive dependency of hive-exec 75733fc [Steve Loughran] SPARK-8064 change thrift binary startup message to "Starting ThriftBinaryCLIService on port" 3ebc279 [Steve Loughran] SPARK-8064 move strings used to check for http/bin thrift services up into constants c80979d [Steve Loughran] SPARK-8064: SparkSQLCLIDriver drops remote mode support. CLISuite Tests pass instead of timing out: undetected regression? 27e8370 [Steve Loughran] SPARK-8064 fix some style & IDE warnings 00e50d6 [Steve Loughran] SPARK-8064 stop excluding hive shims from dependency (commented out , for now) cb4f142 [Steve Loughran] SPARK-8054 cut pentaho dependency from calcite f7aa9cb [Steve Loughran] SPARK-8064 everything compiles with some commenting and moving of classes into a hive package 6c310b4 [Steve Loughran] SPARK-8064 subclass Hive ServerOptionsProcessor to make it public again f61a675 [Steve Loughran] SPARK-8064 thrift server switched to Hive 1.2.1, though it doesn't compile everywhere 4890b9d [Steve Loughran] SPARK-8064, build against Hive 1.2.1
* [SPARK-7977] [BUILD] Disallowing printlnJonathan Alter2015-07-101-5/+7
| | | | | | | | | | | | | | | | | | | | | | | Author: Jonathan Alter <jonalter@users.noreply.github.com> Closes #7093 from jonalter/SPARK-7977 and squashes the following commits: ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite 7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite 10724b6 [Jonathan Alter] Changing some printlns to logs in tests eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 0b1dcb4 [Jonathan Alter] More println cleanup aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 0c16fa3 [Jonathan Alter] Replacing some printlns with logs 45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 5c8e283 [Jonathan Alter] Allowing println in audit-release examples 5b50da1 [Jonathan Alter] Allowing printlns in example files ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 83ab635 [Jonathan Alter] Fixing new printlns 54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns b837c3a [Jonathan Alter] Disallowing println
* [SPARK-8839] [SQL] ThriftServer2 will remove session and execution no matter ↵huangzhaowei2015-07-091-2/+5
| | | | | | | | | | | | | | | | | it's finished or not. In my test, `sessions` and `executions` in ThriftServer2 is not the same number as the connection number. For example, if there are 200 clients connecting to the server, but it will have more than 200 `sessions` and `executions`. So if it reaches the `retainedStatements`, it has to remove some object which is not finished. So it may cause the exception described in [Jira Address](https://issues.apache.org/jira/browse/SPARK-8839) Author: huangzhaowei <carlmartinmax@gmail.com> Closes #7239 from SaintBacchus/SPARK-8839 and squashes the following commits: cf7ef40 [huangzhaowei] Remove the a meanless funciton call 3e9a5a6 [huangzhaowei] Add a filter before take 9d5ceb8 [huangzhaowei] [SPARK-8839][SQL]ThriftServer2 will remove session and execution no matter it's finished or not.
* [SPARK-8650] [SQL] Use the user-specified app name priority in ↵Yadong Qi2015-06-291-1/+6
| | | | | | | | | | | | | | | | | | SparkSQLCLIDriver or HiveThriftServer2 When run `./bin/spark-sql --name query1.sql` [Before] ![before](https://cloud.githubusercontent.com/assets/1400819/8370336/fa20b75a-1bf8-11e5-9171-040049a53240.png) [After] ![after](https://cloud.githubusercontent.com/assets/1400819/8370189/dcc35cb4-1bf6-11e5-8796-a0694140bffb.png) Author: Yadong Qi <qiyadong2010@gmail.com> Closes #7030 from watermen/SPARK-8650 and squashes the following commits: 51b5134 [Yadong Qi] Improve code and add comment. e3d7647 [Yadong Qi] use spark.app.name priority.
* [SPARK-7961][SQL]Refactor SQLConf to display better error messagezsxwing2015-06-172-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Add `SQLConfEntry` to store the information about a configuration. For those configurations that cannot be found in `sql-programming-guide.md`, I left the doc as `<TODO>`. 2. Verify the value when setting a configuration if this is in SQLConf. 3. Use `SET -v` to display all public configurations. Author: zsxwing <zsxwing@gmail.com> Closes #6747 from zsxwing/sqlconf and squashes the following commits: 7d09bad [zsxwing] Use SQLConfEntry in HiveContext 49f6213 [zsxwing] Add getConf, setConf to SQLContext and HiveContext e014f53 [zsxwing] Merge branch 'master' into sqlconf 93dad8e [zsxwing] Fix the unit tests cf950c1 [zsxwing] Fix the code style and tests 3c5f03e [zsxwing] Add unsetConf(SQLConfEntry) and fix the code style a2f4add [zsxwing] getConf will return the default value if a config is not set 037b1db [zsxwing] Add schema to SetCommand 0520c3c [zsxwing] Merge branch 'master' into sqlconf 7afb0ec [zsxwing] Fix the configurations about HiveThriftServer 7e728e3 [zsxwing] Add doc for SQLConfEntry and fix 'toString' 5e95b10 [zsxwing] Add enumConf c6ba76d [zsxwing] setRawString => setConfString, getRawString => getConfString 4abd807 [zsxwing] Fix the test for 'set -v' 6e47e56 [zsxwing] Fix the compilation error 8973ced [zsxwing] Remove floatConf 1fc3a8b [zsxwing] Remove the 'conf' command and use 'set -v' instead 99c9c16 [zsxwing] Fix tests that use SQLConfEntry as a string 88a03cc [zsxwing] Add new lines between confs and return types ce7c6c8 [zsxwing] Remove seqConf f3c1b33 [zsxwing] Refactor SQLConf to display better error message
* [SPARK-6964] [SQL] Support Cancellation in the Thrift ServerDong Wang2015-06-052-14/+157
| | | | | | | | | | | | | | | | | | Support runInBackground in SparkExecuteStatementOperation, and add cancellation Author: Dong Wang <dong@databricks.com> Closes #6207 from dongwang218/SPARK-6964-jdbc-cancel and squashes the following commits: 687c113 [Dong Wang] fix 100 characters 7bfa2a7 [Dong Wang] fix merge 380480f [Dong Wang] fix for liancheng's comments eb3e385 [Dong Wang] small nit 341885b [Dong Wang] small fix 3d8ebf8 [Dong Wang] add spark.sql.hive.thriftServer.async flag 04142c3 [Dong Wang] set SQLSession for async execution 184ec35 [Dong Wang] keep hive conf 819ae03 [Dong Wang] [SPARK-6964][SQL][WIP] Support Cancellation in the Thrift Server
* Fixed style issues for [SPARK-6909][SQL] Remove Hive Shim code.Reynold Xin2015-06-042-8/+11
|
* [SPARK-6909][SQL] Remove Hive Shim codeCheolsoo Park2015-06-047-17/+281
| | | | | | | | | | | | | | | | | | | | This is a follow-up on #6393. I am removing the following files in this PR. ``` ./sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala ./sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala ``` Basically, I re-factored the shim code as follows- * Rewrote code directly with Hive 0.13 methods, or * Converted code into private methods, or * Extracted code into separate classes But for leftover code that didn't fit in any of these cases, I created a HiveShim object. For eg, helper functions which wrap Hive 0.13 methods to work around Hive bugs are placed here. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #6604 from piaozhexiu/SPARK-6909 and squashes the following commits: 5dccc20 [Cheolsoo Park] Remove hive shim code
* [SPARK-7950] [SQL] Sets spark.sql.hive.version in ↵Cheng Lian2015-05-291-7/+8
| | | | | | | | | | | | | | | | | | | HiveThriftServer2.startWithContext() When starting `HiveThriftServer2` via `startWithContext`, property `spark.sql.hive.version` isn't set. This causes Simba ODBC driver 1.0.8.1006 behaves differently and fails simple queries. Hive2 JDBC driver works fine in this case. Also, when starting the server with `start-thriftserver.sh`, both Hive2 JDBC driver and Simba ODBC driver works fine. Please refer to [SPARK-7950] [1] for details. [1]: https://issues.apache.org/jira/browse/SPARK-7950 Author: Cheng Lian <lian@databricks.com> Closes #6500 from liancheng/odbc-bugfix and squashes the following commits: 051e3a3 [Cheng Lian] Fixes import order 3a97376 [Cheng Lian] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext()
* [SPARK-7927] whitespace fixes for Hive and ThriftServer.Reynold Xin2015-05-283-8/+8
| | | | | | | | | | | So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6478 from rxin/whitespace-hive and squashes the following commits: e01b0e0 [Reynold Xin] Fixed tests. a3bba22 [Reynold Xin] [SPARK-7927] whitespace fixes for Hive and ThriftServer.
* [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL.Yin Huai2015-05-273-4/+6
| | | | | | | | | | | | | | | | | This PR has three changes: 1. Renaming the table of `ThriftServer` to `SQL`; 2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; and 3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`. https://issues.apache.org/jira/browse/SPARK-7907 Author: Yin Huai <yhuai@databricks.com> Closes #6448 from yhuai/JDBCServer and squashes the following commits: eadcc3d [Yin Huai] Update test. 9168005 [Yin Huai] Use SQL as the tab name. 221831e [Yin Huai] Rename ThriftServer to JDBCServer.
* [SPARK-7519] [SQL] fix minor bugs in thrift server UItianyi2015-05-112-6/+8
| | | | | | | | | | | | | | Bugs description: 1. There are extra commas on the top of session list. 2. The format of time in "Start at:" part is not the same as others. 3. The total number of online sessions is wrong. Author: tianyi <tianyi.asiainfo@gmail.com> Closes #6048 from tianyi/SPARK-7519 and squashes the following commits: ed366b7 [tianyi] fix bug
* [SPARK-6908] [SQL] Use isolated Hive clientMichael Armbrust2015-05-072-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client. By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile. This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future. Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options: - a colon-separated list of jar files or directories for hive and hadoop. - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. - `maven` - download the correct version of hive on demand from maven. By default, `builtin` is used for Hive 13. This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores. However, the full removal of the Shim is deferred until a later PR. Remaining TODOs: - Remove the Hive Shims and inline code for Hive 13. - Several HiveCompatibility tests are not yet passing. - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer. However, we currently only handle the common cases and not things like CTAS where the null format is specified. - `combine1` now leaks state about compression somehow, breaking all subsequent tests. As such we currently add it to the blacklist - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore. We are correctly propagating the information - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests. It seems our `RESET` mechanism may not be as robust as it used to be? Other required changes: - `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline. Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`. The full parsing here is not yet complete as detailed above in the remaining TODOs. Since the operator is Hive specific, it is moved to the hive package. - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated. Author: Michael Armbrust <michael@databricks.com> Closes #5876 from marmbrus/useIsolatedClient and squashes the following commits: 258d000 [Michael Armbrust] really really correct path handling e56fd4a [Michael Armbrust] getAbsolutePath 5a259f5 [Michael Armbrust] fix typos 81bb366 [Michael Armbrust] comments from vanzin 5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 4b5cd41 [Michael Armbrust] yin's comments f5de7de [Michael Armbrust] cleanup 11e9c72 [Michael Armbrust] better coverage in versions suite 7e8f010 [Michael Armbrust] better error messages and jar handling e7b3941 [Michael Armbrust] more permisive checking for function registration da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 5fe5894 [Michael Armbrust] fix serialization suite 81711c4 [Michael Armbrust] Initial support for running without maven 1d8ae44 [Michael Armbrust] fix final tests? 1c50813 [Michael Armbrust] more comments a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient a6f5df1 [Michael Armbrust] style ab07f7e [Michael Armbrust] WIP 4d8bf02 [Michael Armbrust] Remove hive 12 compilation 8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client
* [SPARK-5100] [SQL] add webui for thriftservertianyi2015-05-044-7/+591
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR is a rebased version of #3946 , and mainly focused on creating an independent tab for the thrift server in spark web UI. Features: 1. Session related statistics ( username and IP are only supported in hive-0.13.1 ) 2. List all the SQL executing or executed on this server 3. Provide links to the job generated by SQL 4. Provide link to show all SQL executing or executed in a specified session Prototype snapshots: This is the main page for thrift server ![image](https://cloud.githubusercontent.com/assets/1411869/7361379/df7dcc64-ed89-11e4-9964-4df0b32f475e.png) Author: tianyi <tianyi.asiainfo@gmail.com> Closes #5730 from tianyi/SPARK-5100 and squashes the following commits: cfd14c7 [tianyi] style fix 0efe3d5 [tianyi] revert part of pom change c0f2fa0 [tianyi] extends HiveThriftJdbcTest to start/stop thriftserver for UI test aa20408 [tianyi] fix style problem c9df6f9 [tianyi] add testsuite for thriftserver ui and fix some style issue 9830199 [tianyi] add webui for thriftserver
* [SPARK-7145] [CORE] commons-lang (2.x) classes used instead of commons-lang3 ↵Sean Owen2015-04-272-7/+5
| | | | | | | | | | | | (3.x); commons-io used without dependency Remove use of commons-lang in favor of commons-lang3 classes; remove commons-io use in favor of Guava Author: Sean Owen <sowen@cloudera.com> Closes #5703 from srowen/SPARK-7145 and squashes the following commits: 21fbe03 [Sean Owen] Remove use of commons-lang in favor of commons-lang3 classes; remove commons-io use in favor of Guava
* [SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.Marcelo Vanzin2015-04-212-14/+4
| | | | | | | | | | | | | | | | | | | | This change adds some new utility code to handle shutdown hooks in Spark. The main goal is to take advantage of Hadoop 2.x's API for shutdown hooks, which allows Spark to register a hook that will run before the one that cleans up HDFS clients, and thus avoids some races that would cause exceptions to show up and other issues such as failure to properly close event logs. Unfortunately, Hadoop 1.x does not have such APIs, so in that case correctness is still left to chance. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5560 from vanzin/SPARK-6014 and squashes the following commits: edfafb1 [Marcelo Vanzin] Better scaladoc. fcaeedd [Marcelo Vanzin] Merge branch 'master' into SPARK-6014 e7039dc [Marcelo Vanzin] [SPARK-6014] [core] Revamp Spark shutdown hooks, fix shutdown races.
* [SPARK-6694][SQL]SparkSQL CLI must be able to specify an option --database ↵Jin Adachi2015-04-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on the command line. SparkSQL CLI has an option --database as follows. But, the option --database is ignored. ``` $ spark-sql --help : CLI options: : --database <databasename> Specify the database to use ``` Author: Jin Adachi <adachij2002@yahoo.co.jp> Author: adachij <adachij@nttdata.co.jp> Closes #5345 from adachij2002/SPARK-6694 and squashes the following commits: 8659084 [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694 0301eb9 [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694 df81086 [Jin Adachi] Modify code style. 846f83e [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694 dbe8c63 [Jin Adachi] Change file permission to 644. 7b58f42 [Jin Adachi] Merge branch 'master' of https://github.com/apache/spark into SPARK-6694 c581d06 [Jin Adachi] Add an option --database test db56122 [Jin Adachi] Merge branch 'SPARK-6694' of https://github.com/adachij2002/spark into SPARK-6694 ee09fa5 [adachij] Merge branch 'master' into SPARK-6694 c804c03 [adachij] SparkSQL CLI must be able to specify an option --database on the command line.
* [SPARK-5794] [SQL] fix add jarDaoyuan Wang2015-04-131-2/+3
| | | | | | | | | | | | | | Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #4586 from adrian-wang/addjar and squashes the following commits: efdd602 [Daoyuan Wang] move jar to another place 6c707e8 [Daoyuan Wang] restrict hive version for test 32c4fb8 [Daoyuan Wang] fix style and add a test 9957d87 [Daoyuan Wang] use sessionstate classloader in makeRDDforTable 0810e71 [Daoyuan Wang] remove variable substitution 1898309 [Daoyuan Wang] fix classnotfound 95a40da [Daoyuan Wang] support env argus in add jar, and set add jar ret to 0
* [SPARK-6440][CORE]Handle IPv6 addresses properly when constructing URInyaapa2015-04-131-1/+2
| | | | | | | | | | Author: nyaapa <nyaapa@gmail.com> Closes #5424 from nyaapa/master and squashes the following commits: 6b717aa [nyaapa] [SPARK-6440][CORE] Remove Utils.localIpAddressHostname, Utils.localIpAddressURI and Utils.getAddressHostName; make Utils.localIpAddress private; rename Utils.localHostURI into Utils.localHostNameForURI; use Utils.localHostName in org.apache.spark.streaming.kinesis.KinesisReceiver and org.apache.spark.sql.hive.thriftserver.SparkSQLEnv 2098081 [nyaapa] [SPARK-6440][CORE] style fixes and use getHostAddress instead of getHostName 84763d7 [nyaapa] [SPARK-6440][CORE]Handle IPv6 addresses properly when constructing URI
* [SPARK-6428][SQL] Added explicit type for all public methods for Hive moduleReynold Xin2015-03-211-2/+2
| | | | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #5108 from rxin/hive-public-type and squashes the following commits: a320328 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods for Hive module.
* [SPARK-5712] [SQL] fix comment with semicolon at endDaoyuan Wang2015-03-171-12/+13
| | | | | | | | | | | | | | ---- comment; Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #4500 from adrian-wang/semicolon and squashes the following commits: 70b8abb [Daoyuan Wang] use mkstring instead of reduce 2d49738 [Daoyuan Wang] remove outdated golden file 317346e [Daoyuan Wang] only skip comment with semicolon at end of line, to avoid golden file outdated d3ae01e [Daoyuan Wang] fix error a11602d [Daoyuan Wang] fix comment with semicolon at end
* [SPARK-2087] [SQL] Multiple thriftserver sessions with single HiveContext ↵Cheng Hao2015-03-171-56/+0
| | | | | | | | | | | | | | | | | instance Still, we keep only a single HiveContext within ThriftServer, and we also create a object called `SQLSession` for isolating the different user states. Developers can obtain/release a new user session via `openSession` and `closeSession`, and `SQLContext` and `HiveContext` will also provide a default session if no `openSession` called, for backward-compatibility. Author: Cheng Hao <hao.cheng@intel.com> Closes #4885 from chenghao-intel/multisessions_singlecontext and squashes the following commits: 1c47b2a [Cheng Hao] rename the tss => tlSession 815b27a [Cheng Hao] code style issue 57e3fa0 [Cheng Hao] openSession is not compatible between Hive0.12 & 0.13.1 4665b0d [Cheng Hao] thriftservice with single context
* [SQL]Delete some dupliate code in HiveThriftServer2ArcherShao2015-03-141-7/+5
| | | | | | | | | | Author: ArcherShao <ArcherShao@users.noreply.github.com> Author: ArcherShao <shaochuan@huawei.com> Closes #5007 from ArcherShao/20150313 and squashes the following commits: ae422ae [ArcherShao] Updated 459efbd [ArcherShao] [SQL]Delete some dupliate code in HiveThriftServer2
* [SQL] Optimize arithmetic and predicate operatorskai2015-02-161-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Existing implementation of arithmetic operators and BinaryComparison operators have redundant type checking codes, e.g.: Expression.n2 is used by Add/Subtract/Multiply. (1) n2 always checks left.dataType == right.dataType. However, this checking should be done once when we resolve expression types; (2) n2 requires dataType is a NumericType. This can be done once. This PR optimizes arithmetic and predicate operators by removing such redundant type-checking codes. Some preliminary benchmarking on 10G TPC-H data over 5 r3.2xlarge EC2 machines shows that this PR can reduce the query time by 5.5% to 11%. The benchmark queries follow the template below, where OP is plus/minus/times/divide/remainder/bitwise and/bitwise or/bitwise xor. SELECT l_returnflag, l_linestatus, SUM(l_quantity OP cnt1), SUM(l_quantity OP cnt2), ...., SUM(l_quantity OP cnt700) FROM ( SELECT l_returnflag, l_linestatus, l_quantity, 1 AS cnt1, 2 AS cnt2, ..., 700 AS cnt700 FROM lineitem WHERE l_shipdate <= '1998-09-01' ) GROUP BY l_returnflag, l_linestatus; Author: kai <kaizeng@eecs.berkeley.edu> Closes #4472 from kai-zeng/arithmetic-optimize and squashes the following commits: fef0cf1 [kai] Merge branch 'master' of github.com:apache/spark into arithmetic-optimize 4b3a1bb [kai] chmod a-x 5a41e49 [kai] chmod a-x Expression.scala cb37c94 [kai] rebase onto spark master 7f6e968 [kai] chmod 100755 -> 100644 6cddb46 [kai] format 7490dbc [kai] fix unresolved-expression exception for EqualTo 9c40bc0 [kai] fix bitwisenot 3cbd363 [kai] clean up test code ca47801 [kai] override evalInternal for bitwise ops 8fa84a1 [kai] add bitwise or and xor 6892fc4 [kai] revert override evalInternal f8eba24 [kai] override evalInternal 31ccdd4 [kai] rewrite all bitwise op and remove evalInternal 86297e2 [kai] generalized cb92ae1 [kai] bitwise-and: override eval 97a7d6c [kai] bitwise-and: override evalInternal using and func 0906c39 [kai] add bitwise test 62abbbc [kai] clean up predicate and arithmetic b34d58d [kai] add caching and benmark option 12c5b32 [kai] override eval 1cd7571 [kai] fix sqrt and maxof 03fd0c3 [kai] fix predicate 16fd84c [kai] optimize + - * / % -(unary) abs < > <= >= fd95823 [kai] remove unnecessary type checking 24d062f [kai] test suite
* [SQL] Add fetched row count in SparkSQLCLIDriverOopsOutOfMemory2015-02-161-2/+10
| | | | | | | | | | | | | | | | | | before this change: ```scala Time taken: 0.619 seconds ``` after this change : ```scala Time taken: 0.619 seconds, Fetched: 4 row(s) ``` Author: OopsOutOfMemory <victorshengli@126.com> Closes #4604 from OopsOutOfMemory/rowcount and squashes the following commits: 7252dea [OopsOutOfMemory] add fetched row count
* [SPARK-5700] [SQL] [Build] Bumps jets3t to 0.9.3 for hadoop-2.3 and ↵Cheng Lian2015-02-101-3/+0
| | | | | | | | | | | | | | | | | | hadoop-2.4 profiles This is a follow-up PR for #4454 and #4484. JetS3t 0.9.2 contains a log4j.properties file inside the artifact and breaks our tests (see SPARK-5696). This is fixed in 0.9.3. This PR also reverts hotfix changes introduced in #4484. The reason is that asking users to configure HiveThriftServer2 logging configurations in hive-log4j.properties can be unintuitive. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4499) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4499 from liancheng/spark-5700 and squashes the following commits: 4f020c7 [Cheng Lian] Bumps jets3t to 0.9.3 for hadoop-2.3 and hadoop-2.4 profiles
* [SPARK-5696] [SQL] [HOTFIX] Asks HiveThriftServer2 to re-initialize log4j ↵Cheng Lian2015-02-091-0/+3
| | | | | | | | | | | | | | | | | | using Hive configurations In this way, log4j configurations overriden by jets3t-0.9.2.jar can be again overriden by Hive default log4j configurations. This might not be the best solution for this issue since it requires users to use `hive-log4j.properties` rather than `log4j.properties` to initialize `HiveThriftServer2` logging configurations, which can be confusing. The main purpose of this PR is to fix Jenkins PR build. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4484) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4484 from liancheng/spark-5696 and squashes the following commits: df83956 [Cheng Lian] Hot fix: asks HiveThriftServer2 to re-initialize log4j using Hive configurations
* [SPARK-5097][SQL] DataFrameReynold Xin2015-01-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pull request redesigns the existing Spark SQL dsl, which already provides data frame like functionalities. TODOs: With the exception of Python support, other tasks can be done in separate, follow-up PRs. - [ ] Audit of the API - [ ] Documentation - [ ] More test cases to cover the new API - [x] Python support - [ ] Type alias SchemaRDD Author: Reynold Xin <rxin@databricks.com> Author: Davies Liu <davies@databricks.com> Closes #4173 from rxin/df1 and squashes the following commits: 0a1a73b [Reynold Xin] Merge branch 'df1' of github.com:rxin/spark into df1 23b4427 [Reynold Xin] Mima. 828f70d [Reynold Xin] Merge pull request #7 from davies/df 257b9e6 [Davies Liu] add repartition 6bf2b73 [Davies Liu] fix collect with UDT and tests e971078 [Reynold Xin] Missing quotes. b9306b4 [Reynold Xin] Remove removeColumn/updateColumn for now. a728bf2 [Reynold Xin] Example rename. e8aa3d3 [Reynold Xin] groupby -> groupBy. 9662c9e [Davies Liu] improve DataFrame Python API 4ae51ea [Davies Liu] python API for dataframe 1e5e454 [Reynold Xin] Fixed a bug with symbol conversion. 2ca74db [Reynold Xin] Couple minor fixes. ea98ea1 [Reynold Xin] Documentation & literal expressions. 2b22684 [Reynold Xin] Got rid of IntelliJ problems. 02bbfbc [Reynold Xin] Tightening imports. ffbce66 [Reynold Xin] Fixed compilation error. 59b6d8b [Reynold Xin] Style violation. b85edfb [Reynold Xin] ALS. 8c37f0a [Reynold Xin] Made MLlib and examples compile 6d53134 [Reynold Xin] Hive module. d35efd5 [Reynold Xin] Fixed compilation error. ce4a5d2 [Reynold Xin] Fixed test cases in SQL except ParquetIOSuite. 66d5ef1 [Reynold Xin] SQLContext minor patch. c9bcdc0 [Reynold Xin] Checkpoint: SQL module compiles!
* [SPARK-4871][SQL] Show sql statement in spark ui when run sql with spark-sqlwangfei2015-01-101-0/+1
| | | | | | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #3718 from scwf/sparksqlui and squashes the following commits: e0d6b5d [wangfei] format fix 383b505 [wangfei] fix conflicts 4d2038a [wangfei] using setJobDescription df79837 [wangfei] fix compile error 92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql
* [SPARK-4935][SQL] When hive.cli.print.header configured, spark-sql aborted ↵wangfei2014-12-301-1/+1
| | | | | | | | | | | | | | | if passed in a invalid sql If we passed in a wrong sql like ```abdcdfsfs```, the spark-sql script aborted. Author: wangfei <wangfei1@huawei.com> Author: Fei Wang <wangfei1@huawei.com> Closes #3761 from scwf/patch-10 and squashes the following commits: 46dc344 [Fei Wang] revert console.printError(rc.getErrorMessage()) 0330e07 [wangfei] avoid to print error message repeatedly 1614a11 [wangfei] spark-sql abort when passed in a wrong sql
* [SPARK-4756][SQL] FIX: sessionToActivePool grow infinitely, even as sessions ↵guowei22014-12-182-3/+9
| | | | | | | | | | | | | | | expire **sessionToActivePool** in **SparkSQLOperationManager** grow infinitely, even as sessions expire. we should remove the pool value when the session closed, even though **sessionToActivePool** would not exist in all of sessions. Author: guowei2 <guowei2@asiainfo.com> Closes #3617 from guowei2/SPARK-4756 and squashes the following commits: e9b97b8 [guowei2] fix compile bug with Shim12 cf0f521 [guowei2] Merge remote-tracking branch 'apache/master' into SPARK-4756 e070998 [guowei2] fix: remove active pool of the session when it expired
* [SPARK-4694]Fix HiveThriftServer2 cann't stop In Yarn HA mode.carlmartin2014-12-171-1/+13
| | | | | | | | | | | | | HiveThriftServer2 can not exit automactic when changing the standy resource manager in Yarn HA mode. The scheduler backend was aware of the AM had been exited so it call sc.stop to exit the driver process but there was a user thread(HiveThriftServer2 ) which was still alive and cause this problem. To fix it, make a demo thread to detect the sparkContext is null or not.If the sc is stopped, call the ThriftServer.stop to stop the user thread. Author: carlmartin <carlmartinmax@gmail.com> Closes #3576 from SaintBacchus/ThriftServer2ExitBug and squashes the following commits: 2890b4a [carlmartin] Use SparkListener instead of the demo thread to stop the hive server. c15da0e [carlmartin] HiveThriftServer2 can not exit automactic when changing the standy resource manager in Yarn HA mode
* [SQL] SPARK-4700: Add HTTP protocol spark thrift serverJudy Nash2014-12-161-4/+17
| | | | | | | | | | | | | | | | | Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode. Author: Judy Nash <judynash@microsoft.com> Author: judynash <judynash@microsoft.com> Closes #3672 from judynash/master and squashes the following commits: 526315d [Judy Nash] correct spacing on startThriftServer method 31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue 47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length 2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide 1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master' 2b1d312 [Judy Nash] updated http thrift server support based on feedback 377532c [judynash] add HTTP protocol spark thrift server
* [SPARK-4761][SQL] Enables Kryo by default in Spark SQL Thrift serverCheng Lian2014-12-051-2/+12
| | | | | | | | | | | | | | Enables Kryo and disables reference tracking by default in Spark SQL Thrift server. Configurations explicitly defined by users in `spark-defaults.conf` are respected (the Thrift server is started by `spark-submit`, which handles configuration properties properly). <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3621) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3621 from liancheng/kryo-by-default and squashes the following commits: 70c2775 [Cheng Lian] Enables Kryo by default in Spark SQL Thrift server
* [SQL] Compute timeTaken correctlyw002289702014-11-241-7/+4
| | | | | | | | | | ```timeTaken``` should not count the time of printing result. Author: w00228970 <wangfei1@huawei.com> Closes #3423 from scwf/time-taken-bug and squashes the following commits: da7e102 [w00228970] compute time taken correctly
* [SPARK-4308][SQL] Sets SQL operation state to ERROR when exception is thrownCheng Lian2014-11-101-2/+0
| | | | | | | | | | In `HiveThriftServer2`, when an exception is thrown during a SQL execution, the SQL operation state should be set to `ERROR`, but now it remains `RUNNING`. This affects the result of the `GetOperationStatus` Thrift API. Author: Cheng Lian <lian@databricks.com> Closes #3175 from liancheng/fix-op-state and squashes the following commits: 6d4c1fe [Cheng Lian] Sets SQL operation state to ERROR when exception is thrown
* [SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark versionCheng Lian2014-11-071-7/+5
| | | | | | | | | | | | This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks. Another related PR is #3103, which tries to fix the MANIFEST issue. Author: Cheng Lian <lian@databricks.com> Closes #3105 from liancheng/spark-4225 and squashes the following commits: d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version
* [SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2Cheng Lian2014-11-022-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR overrides the `GetInfo` Hive Thrift API to provide correct version information. Another property `spark.sql.hive.version` is added to reveal the underlying Hive version. These are generally useful for Spark SQL ODBC driver providers. The Spark version information is extracted from the jar manifest. Also took the chance to remove the `SET -v` hack, which was a workaround for Simba ODBC driver connectivity. TODO - [x] Find a general way to figure out Hive (or even any dependency) version. This [blog post](http://blog.soebes.de/blog/2014/01/02/version-information-into-your-appas-with-maven/) suggests several methods to inspect application version. In the case of Spark, this can be tricky because the chosen method: 1. must applies to both Maven build and SBT build For Maven builds, we can retrieve the version information from the META-INF/maven directory within the assembly jar. But this doesn't work for SBT builds. 2. must not rely on the original jars of dependencies to extract specific dependency version, because Spark uses assembly jar. This implies we can't read Hive version from Hive jar files since standard Spark distribution doesn't include them. 3. should play well with `SPARK_PREPEND_CLASSES` to ease local testing during development. `SPARK_PREPEND_CLASSES` prevents classes to be loaded from the assembly jar, thus we can't locate the jar file and read its manifest. Given these, maybe the only reliable method is to generate a source file containing version information at build time. pwendell Do you have any suggestions from the perspective of the build process? **Update** Hive version is now retrieved from the newly introduced `HiveShim` object. Author: Cheng Lian <lian.cs.zju@gmail.com> Author: Cheng Lian <lian@databricks.com> Closes #2843 from liancheng/get-info and squashes the following commits: a873d0f [Cheng Lian] Updates test case 53f43cd [Cheng Lian] Retrieves underlying Hive verson via HiveShim 1d282b8 [Cheng Lian] Removes the Simba ODBC "SET -v" hack f857fce [Cheng Lian] Overrides Hive GetInfo Thrift API and adds Hive version property