spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-20316][SQL] Val and Var should strictly follow the Scala syntax	ouyangxiaochen	2017-04-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? val and var should strictly follow the Scala syntax ## How was this patch tested? manual test and exisiting test cases Author: ouyangxiaochen <ou.yangxiaochen@zte.com.cn> Closes #17628 from ouyangxiaochen/spark-413.
*	[SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins	hyukjinkwon	2017-04-12	10	-24/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable. There are several problems with it: - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?". - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up. (see joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627)) To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above. There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013 Note that this only fixes errors not warnings. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings. ## How was this patch tested? Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`. This was tested via manually adding `time.time()` as below: ```diff profiles_and_goals = build_profiles + sbt_goals print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ", " ".join(profiles_and_goals)) + import time + st = time.time() exec_sbt(profiles_and_goals) + print("Elapsed :[%s]" % str(time.time() - st)) ``` produces ``` ... ======================================================================== Building Unidoc API Documentation ======================================================================== ... [info] Main Java API documentation successful. ... Elapsed :[94.8746569157] ... Author: hyukjinkwon <gurwls223@gmail.com> Closes #17477 from HyukjinKwon/SPARK-18692.
*	[SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish ↵	Sean Owen	2017-04-10	5	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	locale bug" causes Spark problems ## What changes were proposed in this pull request? Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem"). The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods. ## How was this patch tested? Existing tests. Author: Sean Owen <sowen@cloudera.com> Closes #17527 from srowen/SPARK-20156.
*	[SPARK-20173][SQL][HIVE-THRIFTSERVER] Throw NullPointerException when ↵	zuotingbing	2017-04-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HiveThriftServer2 is shutdown ## What changes were proposed in this pull request? If the shutdown hook called before the variable `uiTab` is set , it will throw a NullPointerException. ## How was this patch tested? manual tests Author: zuotingbing <zuo.tingbing9@zte.com.cn> Closes #17496 from zuotingbing/SPARK-HiveThriftServer2.
*	[SPARK-20146][SQL] fix comment missing issue for thrift server	bomeng	2017-03-29	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The column comment was missing while constructing the Hive TableSchema. This fix will preserve the original comment. ## How was this patch tested? I have added a new test case to test the column with/without comment. Author: bomeng <bmeng@us.ibm.com> Closes #17470 from bomeng/SPARK-20146.
*	[SPARK-20120][SQL] spark-sql support silent mode	Yuming Wang	2017-03-29	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? It is similar to Hive silent mode, just show the query result. see: [Hive LanguageManual+Cli](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) and [the implementation of Hive silent mode](https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L948-L950). This PR set the Logger level to `WARN` to get similar result. ## How was this patch tested? manual tests ![manual test spark sql silent mode](https://cloud.githubusercontent.com/assets/5399861/24390165/989b7780-13b9-11e7-8496-6e68f55757e3.gif) Author: Yuming Wang <wgyumg@gmail.com> Closes #17449 from wangyum/SPARK-20120.
*	[SPARK-20048][SQL] Cloning SessionState does not clone query execution listeners	Kunal Khamar	2017-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Bugfix from [SPARK-19540.](https://github.com/apache/spark/pull/16826) Cloning SessionState does not clone query execution listeners, so cloned session is unable to listen to events on queries. ## How was this patch tested? - Unit test Author: Kunal Khamar <kkhamar@outlook.com> Closes #17379 from kunalkhamar/clone-bugfix.
*	[SPARK-20126][SQL] Remove HiveSessionState	Herman van Hovell	2017-03-28	2	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Commit https://github.com/apache/spark/commit/ea361165e1ddce4d8aa0242ae3e878d7b39f1de2 moved most of the logic from the SessionState classes into an accompanying builder. This makes the existence of the `HiveSessionState` redundant. This PR removes the `HiveSessionState`. ## How was this patch tested? Existing tests. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #17457 from hvanhovell/SPARK-20126.
*	[SPARK-19673][SQL] "ThriftServer default app name is changed wrong"	lvdongr	2017-02-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? In spark 1.x ,the name of ThriftServer is SparkSQL:localHostName. While the ThriftServer default name is changed to the className of HiveThfift2 , which is not appropriate. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. Author: lvdongr <lv.dongdong@zte.com.cn> Closes #17010 from lvdongr/ThriftserverName.
*	[SPARK-18857][SQL] Don't use `Iterator.duplicate` for `incrementalCollect` ↵	Dongjoon Hyun	2017-01-10	1	-11/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in Thrift Server ## What changes were proposed in this pull request? To support `FETCH_FIRST`, SPARK-16563 used Scala `Iterator.duplicate`. However, Scala `Iterator.duplicate` uses a queue to buffer all items between both iterators, this causes GC and hangs for queries with large number of rows. We should not use this, especially for `spark.sql.thriftServer.incrementalCollect`. https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/Iterator.scala#L1262-L1300 ## How was this patch tested? Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16440 from dongjoon-hyun/SPARK-18857.
*	[MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo	Niranjan Padmanabhan	2017-01-04	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words. ## How was this patch tested? N/A since only docs or comments were updated. Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com> Closes #16455 from neurons/np.structure_streaming_doc.
*	[SPARK-18992][SQL] Move spark.sql.hive.thriftServer.singleSession to SQLConf	gatorsmile	2016-12-28	3	-45/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	### What changes were proposed in this pull request? Since `spark.sql.hive.thriftServer.singleSession` is a configuration of SQL component, this conf can be moved from `SparkConf` to `StaticSQLConf`. When we introduced `spark.sql.hive.thriftServer.singleSession`, all the SQL configuration are session specific. They can be modified in different sessions. In Spark 2.1, static SQL configuration is added. It is a perfect fit for `spark.sql.hive.thriftServer.singleSession`. Previously, we did the same move for `spark.sql.warehouse.dir` from `SparkConf` to `StaticSQLConf` ### How was this patch tested? Added test cases in HiveThriftServer2Suites.scala Author: gatorsmile <gatorsmile@gmail.com> Closes #16392 from gatorsmile/hiveThriftServerSingleSession.
*	[SPARK-17807][CORE] split test-tags into test-JAR	Ryan Williams	2016-12-21	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR. Alternative to #16303. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #16311 from ryan-williams/tt.
*	[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT	Reynold Xin	2016-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This patch bumps master branch version to 2.2.0-SNAPSHOT. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #16126 from rxin/SPARK-18695.
*	[SPARK-14914][CORE] Fix Resource not closed after using, for unit tests and ↵	wm624@hotmail.com	2016-11-10	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	example ## What changes were proposed in this pull request? This is a follow-up work of #15618. Close file source; For any newly created streaming context outside the withContext, explicitly close the context. ## How was this patch tested? Existing unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #15818 from wangmiao1981/rtest.
*	[SPARK-18086] Add support for Hive session vars.	Ryan Blue	2016-11-07	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This adds support for Hive variables: * Makes values set via `spark-sql --hivevar name=value` accessible * Adds `getHiveVar` and `setHiveVar` to the `HiveClient` interface * Adds a SessionVariables trait for sessions like Hive that support variables (including Hive vars) * Adds SessionVariables support to variable substitution * Adds SessionVariables support to the SET command ## How was this patch tested? * Adds a test to all supported Hive versions for accessing Hive variables * Adds HiveVariableSubstitutionSuite Author: Ryan Blue <blue@apache.org> Closes #15738 from rdblue/SPARK-18086-add-hivevar-support.
*	[SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server	Josh Rosen	2016-11-01	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In SPARK-4761 / #3621 (December 2014) we enabled Kryo serialization by default in the Spark Thrift Server. However, I don't think that the original rationale for doing this still holds now that most Spark SQL serialization is now performed via encoders and our UnsafeRow format. In addition, the use of Kryo as the default serializer can introduce performance problems because the creation of new KryoSerializer instances is expensive and we haven't performed instance-reuse optimizations in several code paths (including DirectTaskResult deserialization). Given all of this, I propose to revert back to using JavaSerializer as the default serializer in the Thrift Server. /cc liancheng Author: Josh Rosen <joshrosen@databricks.com> Closes #14906 from JoshRosen/disable-kryo-in-thriftserver.
*	[SPARK-17819][SQL] Support default database in connection URIs for Spark ↵	Dongjoon Hyun	2016-10-16	2	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thrift Server ## What changes were proposed in this pull request? Currently, Spark Thrift Server ignores the default database in URI. This PR supports that like the following. ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "create database testdb" $ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "create table t(a int)" $ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "show tables" ... +------------+--------------+--+ \| tableName \| isTemporary \| +------------+--------------+--+ \| t \| false \| +------------+--------------+--+ 1 row selected (0.347 seconds) $ bin/beeline -u jdbc:hive2://localhost:10000 -e "show tables" ... +------------+--------------+--+ \| tableName \| isTemporary \| +------------+--------------+--+ +------------+--------------+--+ No rows selected (0.098 seconds) ``` ## How was this patch tested? Manual. Note: I tried to add a test case for this, but I cannot found a suitable testsuite for this. I'll add the testcase if some advice is given. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15399 from dongjoon-hyun/SPARK-17819.
*	[SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished	Sean Owen	2016-10-07	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daemon threads. It should otherwise result in exactly the same configuration, because the other args are copied from the constructor that is currently called. (I'm not sure we should change the Hive Thriftserver impl, but I did anyway.) This also adds `sc.stop()` to the quick start guide example. ## How was this patch tested? Existing tests; _pending_ at least manual verification of the fix. Author: Sean Owen <sowen@cloudera.com> Closes #15381 from srowen/SPARK-17707.
*	[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException ↵	Dongjoon Hyun	2016-10-03	2	-6/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in Thriftserver ## What changes were proposed in this pull request? Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2. Before ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0) Closing: 0: jdbc:hive2://localhost:10000 $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0) Closing: 0: jdbc:hive2://localhost:10000 ``` After ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ +-------+--+ \| NULL \| +-------+--+ \| NULL \| +-------+--+ 1 row selected (3.242 seconds) Beeline version 1.2.1.spark2 by Apache Hive Closing: 0: jdbc:hive2://localhost:10000 $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ +-------------------------+--+ \| (IF(true, NULL, NULL)) \| +-------------------------+--+ \| NULL \| +-------------------------+--+ 1 row selected (0.201 seconds) Beeline version 1.2.1.spark2 by Apache Hive Closing: 0: jdbc:hive2://localhost:10000 ``` ## How was this patch tested? * Pass the Jenkins test with a new testsuite. * Also, Manually, after starting Spark Thrift Server, run the following command. ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" ``` Hive 1.2 ```sql hive> create table null_table as select null; hive> desc null_table; OK _c0 void ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15325 from dongjoon-hyun/SPARK-17112.
*	[SPARK-17190][SQL] Removal of HiveSharedState	gatorsmile	2016-08-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	### What changes were proposed in this pull request? Since `HiveClient` is used to interact with the Hive metastore, it should be hidden in `HiveExternalCatalog`. After moving `HiveClient` into `HiveExternalCatalog`, `HiveSharedState` becomes a wrapper of `HiveExternalCatalog`. Thus, removal of `HiveSharedState` becomes straightforward. After removal of `HiveSharedState`, the reflection logic is directly applied on the choice of `ExternalCatalog` types, based on the configuration of `CATALOG_IMPLEMENTATION`. ~~`HiveClient` is also used/invoked by the other entities besides HiveExternalCatalog, we defines the following two APIs: getClient and getNewClient~~ ### How was this patch tested? The existing test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #14757 from gatorsmile/removeHiveClient.
*	[SPARK-16941] Use concurrentHashMap instead of scala Map in ↵	huangzhaowei	2016-08-11	3	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkSQLOperationManager. ## What changes were proposed in this pull request? ThriftServer will have some thread-safe problem in SparkSQLOperationManager. Add a SynchronizedMap trait for the maps in it to avoid this problem. Details in [SPARK-16941](https://issues.apache.org/jira/browse/SPARK-16941) ## How was this patch tested? NA Author: huangzhaowei <carlmartinmax@gmail.com> Closes #14534 from SaintBacchus/SPARK-16941.
*	[SPARK-16563][SQL] fix spark sql thrift server FetchResults bug	Alice	2016-08-08	2	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly. JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563 ## How was this patch tested? This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server. Author: Alice <alice.gugu@gmail.com> Author: Alice <guhq@garena.com> Closes #14218 from alicegugu/SparkSQLFetchResultsBug.
*	[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant ↵	Xin Ren	2016-07-19	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	definition and inherited from the parent https://issues.apache.org/jira/browse/SPARK-16535 ## What changes were proposed in this pull request? When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot ``` Definition of groupId is redundant, because it's inherited from the parent ``` ![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png) I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok. ``` <groupId>org.apache.spark</groupId> ``` As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1). ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762 ## How was this patch tested? I've tested by re-building the project, and build succeeded. Author: Xin Ren <iamshrek@126.com> Closes #14189 from keypointt/SPARK-16535.
*	[SPARK-16477] Bump master version to 2.1.0-SNAPSHOT	Reynold Xin	2016-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14130 from rxin/SPARK-16477.
*	[SPARK-15730][SQL] Respect the --hiveconf in the spark-sql command line	Cheng Hao	2016-07-05	2	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR makes spark-sql (backed by SparkSQLCLIDriver) respects confs set by hiveconf, which is what we do in previous versions. The change is that when we start SparkSQLCLIDriver, we explicitly set confs set through --hiveconf to SQLContext's conf (basically treating those confs as a SparkSQL conf). ## How was this patch tested? A new test in CliSuite. Closes #13542 Author: Cheng Hao <hao.cheng@intel.com> Author: Yin Huai <yhuai@databricks.com> Closes #14058 from yhuai/hiveConfThriftServer.
*	[SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in ↵	Sean Owen	2016-06-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	favor of commons-lang3 ## What changes were proposed in this pull request? Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of JDK `UnsupportedOperationException` ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13843 from srowen/SPARK-16129.
*	[SPARK-15934] [SQL] Return binary mode in ThriftServer	Egor Pakhomov	2016-06-15	3	-14/+47
\| \| \| \| \| \| \| \| \| \|	Returning binary mode to ThriftServer for backward compatibility. Tested with Squirrel and Tableau. Author: Egor Pakhomov <egor@anchorfree.com> Closes #13667 from epahomov/SPARK-15095-2.0.
*	doc fix of HiveThriftServer	Jeff Zhang	2016-06-14	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Just minor doc fix. \cc yhuai Author: Jeff Zhang <zjffdu@apache.org> Closes #13659 from zjffdu/doc_fix.
*	[SPARK-15431][SQL][BRANCH-2.0-TEST] rework the clisuite test cases	Xin Wu	2016-05-27	1	-11/+26
\| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR reworks on the CliSuite test cases for `LIST FILES/JARS` commands. CC yhuai Thanks! Author: Xin Wu <xinwu@us.ibm.com> Closes #13361 from xwu0226/SPARK-15431-clisuite-new.
*	[SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now	Xin Wu	2016-05-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The test cases for `list` command added in `CliSuite` by PR #13212 can not run in some jenkins jobs after being merged. However, some jenkins jobs can pass: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/ https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/ https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.2/ https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/ https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.3/ Others failed on this test case. But the failures on those jobs are at slightly different checkpoints among different jobs too. So it seems that CliSuite's output capture is flaky for list commands to check for expected output. There are test cases already in `HiveQuerySuite` and `SparkContextSuite` to cover the cases. So I am ignoring 2 test cases added by PR #13212 . Author: Xin Wu <xinwu@us.ibm.com> Closes #13276 from xwu0226/SPARK-15431-clisuite.
*	[MINOR] Fix Typos 'a -> an'	Zheng RuiFeng	2016-05-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `a` -> `an` I use regex to generate potential error lines: `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml//scala` and review them line by line. ## How was this patch tested? local build `lint-java` checking Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13317 from zhengruifeng/a_an.
*	[SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession	Reynold Xin	2016-05-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? SparkSession has a list of unnecessary private[sql] methods. These methods cause some trouble because private[sql] doesn't apply in Java. In the cases that they are easy to remove, we can simply remove them. This patch does that. As part of this pull request, I also replaced a bunch of protected[sql] with private[sql], to tighten up visibility. ## How was this patch tested? Updated test cases to reflect the changes. Author: Reynold Xin <rxin@databricks.com> Closes #13319 from rxin/SPARK-15552.
*	[MINOR][MLLIB][STREAMING][SQL] Fix typos	lfzCarlosC	2016-05-25	4	-5/+5
\| \| \| \| \| \| \| \| \| \|	fixed typos for source code for components [mllib] [streaming] and [SQL] None and obvious. Author: lfzCarlosC <lfz.carlos@gmail.com> Closes #13298 from lfzCarlosC/master.
*	[SPARK-15431][SQL] Support LIST FILE(s)\|JAR(s) command natively	Xin Wu	2016-05-23	3	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Currently command `ADD FILE\|JAR <filepath \| jarpath>` is supported natively in SparkSQL. However, when this command is run, the file/jar is added to the resources that can not be looked up by `LIST FILE(s)\|JAR(s)` command because the `LIST` command is passed to Hive command processor in Spark-SQL or simply not supported in Spark-shell. There is no way users can find out what files/jars are added to the spark context. Refer to [Hive commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) This PR is to support following commands: `LIST (FILE[s] [filepath ...] \| JAR[s] [jarfile ...])` ### For example: ##### LIST FILE(s) ``` scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt") res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt") res2: org.apache.spark.sql.DataFrame = [] scala> spark.sql("list file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false) +----------------------------------------------+ \|result \| +----------------------------------------------+ \|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt\| +----------------------------------------------+ scala> spark.sql("list files").show(false) +----------------------------------------------+ \|result \| +----------------------------------------------+ \|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt\| \|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt \| +----------------------------------------------+ ``` ##### LIST JAR(s) ``` scala> spark.sql("add jar /Users/xinwu/spark/core/src/test/resources/TestUDTF.jar") res9: org.apache.spark.sql.DataFrame = [result: int] scala> spark.sql("list jar TestUDTF.jar").show(false) +---------------------------------------------+ \|result \| +---------------------------------------------+ \|spark://192.168.1.234:50131/jars/TestUDTF.jar\| +---------------------------------------------+ scala> spark.sql("list jars").show(false) +---------------------------------------------+ \|result \| +---------------------------------------------+ \|spark://192.168.1.234:50131/jars/TestUDTF.jar\| +---------------------------------------------+ ``` ## How was this patch tested? New test cases are added for Spark-SQL, Spark-Shell and SparkContext API code path. Author: Xin Wu <xinwu@us.ibm.com> Author: xin Wu <xinwu@us.ibm.com> Closes #13212 from xwu0226/list_command.
*	[SPARK-15330][SQL] Implement Reset Command	gatorsmile	2016-05-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	#### What changes were proposed in this pull request? Like `Set` Command in Hive, `Reset` is also supported by Hive. See the link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli Below is the related Hive JIRA: https://issues.apache.org/jira/browse/HIVE-3202 This PR is to implement such a command for resetting the SQL-related configuration to the default values. One of the use case shown in HIVE-3202 is listed below: > For the purpose of optimization we set various configs per query. It's worthy but all those configs should be reset every time for next query. #### How was this patch tested? Added a test case. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #13121 from gatorsmile/resetCommand.
*	[SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate ↵	Reynold Xin	2016-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	config options to existing sessions if specified ## What changes were proposed in this pull request? Currently SparkSession.Builder use SQLContext.getOrCreate. It should probably the the other way around, i.e. all the core logic goes in SparkSession, and SQLContext just calls that. This patch does that. This patch also makes sure config options specified in the builder are propagated to the existing (and of course the new) SparkSession. ## How was this patch tested? Updated tests to reflect the change, and also introduced a new SparkSessionBuilderSuite that should cover all the branches. Author: Reynold Xin <rxin@databricks.com> Closes #13200 from rxin/SPARK-15075.
*	[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into ↵	Sean Owen	2016-05-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-tags ## What changes were proposed in this pull request? (See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.) Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags` ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13074 from srowen/SPARK-15290.
*	[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient	Sean Owen	2016-05-15	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Retry of https://github.com/apache/spark/pull/13049) - update to httpclient 4.5 / httpcore 4.4 - remove some defunct exclusions - manage httpmime version to match - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used) ## How was this patch tested? Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...` Author: Sean Owen <sowen@cloudera.com> Closes #13117 from srowen/SPARK-12972.2.
*	[SPARK-14897][SQL] upgrade to jetty 9.2.16	bomeng	2016-05-12	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+. `javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version. ## How was this patch tested? Manual test and current test cases should cover it. Author: bomeng <bmeng@us.ibm.com> Closes #12916 from bomeng/SPARK-14897.
*	[SPARK-15108][SQL] Describe Permanent UDTF	gatorsmile	2016-05-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	#### What changes were proposed in this pull request? When Describe a UDTF, the command returns a wrong result. The command is unable to find the function, which has been created and cataloged in the catalog but not in the functionRegistry. This PR is to correct it. If the function is not in the functionRegistry, we will check the catalog for collecting the information of the UDTF function. #### How was this patch tested? Added test cases to verify the results Author: gatorsmile <gatorsmile@gmail.com> Closes #12885 from gatorsmile/showFunction.
*	[SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update ↵	Dongjoon Hyun	2016-05-05	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	binary_classification_metrics_example.py ## What changes were proposed in this pull request? This issue addresses the comments in SPARK-15031 and also fix java-linter errors. - Use multiline format in SparkSession builder patterns. - Update `binary_classification_metrics_example.py` to use `SparkSession`. - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far) ## How was this patch tested? After passing the Jenkins tests and run `dev/lint-java` manually. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12911 from dongjoon-hyun/SPARK-15134.
*	[SPARK-15072][SQL][REPL][EXAMPLES] Remove SparkSession.withHiveSupport	Sandeep Singh	2016-05-05	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Removing the `withHiveSupport` method of `SparkSession`, instead use `enableHiveSupport` ## How was this patch tested? ran tests locally Author: Sandeep Singh <sandeep@techaddict.me> Closes #12851 from techaddict/SPARK-15072.
*	[SPARK-12154] Upgrade to Jersey 2	mcheah	2016-05-05	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things. ## How was this patch tested? I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications. Author: mcheah <mcheah@palantir.com> Closes #12715 from mccheah/feature/upgrade-jersey.
*	[SPARK-15095][SQL] remove HiveSessionHook from ThriftServer	Davies Liu	2016-05-03	2	-57/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Remove HiveSessionHook ## How was this patch tested? No tests needed. Author: Davies Liu <davies@databricks.com> Closes #12881 from davies/remove_hooks.
*	[SQL-15102][SQL] remove delegation token support from ThriftServer	Davies Liu	2016-05-03	1	-58/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? These API is only useful for Hadoop, may not work for Spark SQL. The APIs is kept for source compatibility. ## How was this patch tested? No unit tests needed. Author: Davies Liu <davies@databricks.com> Closes #12878 from davies/remove_delegate.
*	[SPARK-15095][SQL] drop binary mode in ThriftServer	Davies Liu	2016-05-03	3	-47/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR drop the support for binary mode in ThriftServer, only HTTP mode is supported now, to reduce the maintain burden. The code to support binary mode is still kept, just in case if we want it in future. ## How was this patch tested? Updated tests to use HTTP mode. Author: Davies Liu <davies@databricks.com> Closes #12876 from davies/hide_binary.
*	[SPARK-15053][BUILD] Fix Java Lint errors on Hive-Thriftserver module	Dongjoon Hyun	2016-05-03	29	-71/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This issue fixes or hides 181 Java linter errors introduced by SPARK-14987 which copied hive service code from Hive. We had better clean up these errors before releasing Spark 2.0. - Fix UnusedImports (15 lines), RedundantModifier (14 lines), SeparatorWrap (9 lines), MethodParamPad (6 lines), FileTabCharacter (5 lines), ArrayTypeStyle (3 lines), ModifierOrder (3 lines), RedundantImport (1 line), CommentsIndentation (1 line), UpperEll (1 line), FallThrough (1 line), OneStatementPerLine (1 line), NewlineAtEndOfFile (1 line) errors. - Ignore `LineLength` errors under `hive/service/*` (118 lines). - Ignore `MethodName` error in `PasswdAuthenticationProvider.java` (1 line). - Ignore `NoFinalizer` error in `ThreadWithGarbageCleanup.java` (1 line). ## How was this patch tested? After passing Jenkins building, run `dev/lint-java` manually. ```bash $ dev/lint-java Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12831 from dongjoon-hyun/SPARK-15053.
*	[SPARK-15004][SQL] Remove zookeeper service discovery code in thrift-server	Reynold Xin	2016-04-29	5	-506/+6
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? We recently inlined Hive's thrift server code in SPARK-15004. This patch removes the code related to zookeeper service discovery, Tez, and Hive on Spark, since they are irrelevant. ## How was this patch tested? N/A - removing dead code Author: Reynold Xin <rxin@databricks.com> Closes #12780 from rxin/SPARK-15004.
*	[SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver	Davies Liu	2016-04-29	175	-4/+69969
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR copy the thrift-server from hive-service-1.2 (including TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12764 from davies/thrift_server.