spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-12395] [SQL] fix resulting columns of outer join	Davies Liu	2015-12-17	2	-9/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <davies@databricks.com> Closes #10353 from davies/fix_join.
*	[SPARK-12057][SQL] Prevent failure on corrupt JSON records	Yin Huai	2015-12-16	4	-12/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference. Regarding the schema inference change, if we have something like ``` {"f1":1} [1,2,3] ``` originally, we will get a DF without any column. After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`. When merge this PR, please make sure that the author is simplyianm. JIRA: https://issues.apache.org/jira/browse/SPARK-12057 Closes #10043 Author: Ian Macalinao <me@ian.pw> Author: Yin Huai <yhuai@databricks.com> Closes #10288 from yhuai/handleCorruptJson.
*	[SPARK-12365][CORE] Use ShutdownHookManager where ↵	tedyu	2015-12-16	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Runtime.getRuntime.addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <yuzhihong@gmail.com> Closes #10325 from ted-yu/master.
*	[SPARK-11677][SQL] ORC filter tests all pass if filters are actually not ↵	hyukjinkwon	2015-12-16	1	-17/+36
\| \| \| \| \| \| \| \| \| \| \|	pushed down. Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this. Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values. Author: hyukjinkwon <gurwls223@gmail.com> Closes #9687 from HyukjinKwon/SPARK-11677.
*	[SPARK-12164][SQL] Decode the encoded values and then display	gatorsmile	2015-12-16	5	-48/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show` ```scala implicit val kryoEncoder = Encoders.kryo[KryoClassData] val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS() ds.show(20, false); ``` The current output is like ``` +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \|value \| +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]\| \|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]\| \|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]\| +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData` ```scala override def toString: String = s"KryoClassData($a, $b)" ``` ``` +-------------------+ \|value \| +-------------------+ \|KryoClassData(a, 1)\| \|KryoClassData(b, 2)\| \|KryoClassData(c, 3)\| +-------------------+ ``` If users do not override the `toString` function, the results will be like ``` +---------------------------------------+ \|value \| +---------------------------------------+ \|org.apache.spark.sql.KryoClassData68ef\| \|org.apache.spark.sql.KryoClassData6915\| \|org.apache.spark.sql.KryoClassData693b\| +---------------------------------------+ ``` Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values? Author: gatorsmile <gatorsmile@gmail.com> Closes #10215 from gatorsmile/showDecodedValue.
*	[SPARK-12320][SQL] throw exception if the number of fields does not line up ↵	Wenchen Fan	2015-12-16	5	-18/+93
\| \| \| \| \| \| \| \|	for Tuple encoder Author: Wenchen Fan <wenchen@databricks.com> Closes #10293 from cloud-fan/err-msg.
*	[SPARK-8745] [SQL] remove GenerateProjection	Davies Liu	2015-12-16	8	-319/+11
\| \| \| \| \| \| \| \|	cc rxin Author: Davies Liu <davies@databricks.com> Closes #10316 from davies/remove_generate_projection.
*	Revert "[SPARK-12105] [SQL] add convenient show functions"	Reynold Xin	2015-12-16	1	-16/+9
\| \| \| \|	This reverts commit 31b391019ff6eb5a483f4b3e62fd082de7ff8416.
*	Revert "[HOTFIX] Compile error from commit 31b3910"	Reynold Xin	2015-12-16	1	-1/+1
\| \| \| \|	This reverts commit 840bd2e008da5b22bfa73c587ea2c57666fffc60.
*	Style fix for the previous 3 JDBC filter push down commits.	Reynold Xin	2015-12-15	1	-9/+8
\|
*	[SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource.	hyukjinkwon	2015-12-15	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12315 `IsNotNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string. Author: hyukjinkwon <gurwls223@gmail.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10287 from HyukjinKwon/SPARK-12315.
*	[SPARK-12314][SQL] isnull operator not pushed down for JDBC datasource.	hyukjinkwon	2015-12-15	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12314 `IsNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNull` filter to produce a proper filter string. Author: hyukjinkwon <gurwls223@gmail.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10286 from HyukjinKwon/SPARK-12314.
*	[SPARK-12249][SQL] JDBC non-equality comparison operator not pushed down.	hyukjinkwon	2015-12-15	2	-0/+3
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12249 Currently `!=` operator is not pushed down correctly. I simply added a case for this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10233 from HyukjinKwon/SPARK-12249.
*	[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability	Wenchen Fan	2015-12-15	2	-21/+27
\| \| \| \| \| \|	Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8645 from cloud-fan/test.
*	[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling ↵	tedyu	2015-12-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu <yuzhihong@gmail.com> Closes #10164 from tedyu/master.
*	[HOTFIX] Compile error from commit 31b3910	Andrew Or	2015-12-15	1	-1/+1
\|
*	[SPARK-12105] [SQL] add convenient show functions	Jean-Baptiste Onofré	2015-12-15	1	-9/+16
\| \| \| \| \| \|	Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10130 from jbonofre/SPARK-12105.
*	[SPARK-12236][SQL] JDBC filter tests all pass if filters are not really ↵	hyukjinkwon	2015-12-15	3	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pushed down https://issues.apache.org/jira/browse/SPARK-12236 Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering. In this PR, Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering. Also, `!=` was being tested which is actually not pushed down. So I removed them. Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10221 from HyukjinKwon/SPARK-12236.
*	[SPARK-12271][SQL] Improve error message when Dataset.as[ ] has incompatible ↵	Nong Li	2015-12-15	4	-7/+18
\| \| \| \| \| \| \| \|	schemas. Author: Nong Li <nong@databricks.com> Closes #10260 from nongli/spark-11271.
*	[SPARK-12288] [SQL] Support UnsafeRow in Coalesce/Except/Intersect.	gatorsmile	2015-12-14	2	-1/+46
\| \| \| \| \| \| \| \| \| \|	Support UnsafeRow for the Coalesce/Except/Intersect. Could you review if my code changes are ok? davies Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10285 from gatorsmile/unsafeSupportCIE.
*	[SPARK-12188][SQL][FOLLOW-UP] Code refactoring and comment correction in ↵	gatorsmile	2015-12-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Dataset APIs marmbrus This PR is to address your comment. Thanks for your review! Author: gatorsmile <gatorsmile@gmail.com> Closes #10214 from gatorsmile/followup12188.
*	[SPARK-12274][SQL] WrapOption should not have type constraint for child	Wenchen Fan	2015-12-14	1	-4/+1
\| \| \| \| \| \| \| \|	I think it was a mistake, and we have not catched it so far until https://github.com/apache/spark/pull/10260 which begin to check if the `fromRowExpression` is resolved. Author: Wenchen Fan <wenchen@databricks.com> Closes #10263 from cloud-fan/encoder.
*	[SPARK-12275][SQL] No plan for BroadcastHint in some condition	yucai	2015-12-13	2	-1/+8
\| \| \| \| \| \| \| \| \| \|	When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <yucai.yu@intel.com> Closes #10265 from yucai/broadcast_hint.
*	[SPARK-12213][SQL] use multiple partitions for single distinct query	Davies Liu	2015-12-13	10	-990/+422
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we could generate different plans for query with single distinct (depends on spark.sql.specializeSingleDistinctAggPlanning), one works better on low cardinality columns, the other works better for high cardinality column (default one). This PR change to generate a single plan (three aggregations and two exchanges), which work better in both cases, then we could safely remove the flag `spark.sql.specializeSingleDistinctAggPlanning` (introduced in 1.6). For a query like `SELECT COUNT(DISTINCT a) FROM table` will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping) Partial-AGG-2 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on a) ``` This PR also includes large refactor for aggregation (reduce 500+ lines of code) cc yhuai nongli marmbrus Author: Davies Liu <davies@databricks.com> Closes #10228 from davies/single_distinct.
*	[SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions	Ankur Dave	2015-12-11	2	-3/+3
\| \| \| \| \| \| \| \|	Modifies the String overload to call the Column overload and ensures this is called in a test. Author: Ankur Dave <ankurdave@gmail.com> Closes #10271 from ankurdave/SPARK-12298.
*	[SPARK-12258] [SQL] passing null into ScalaUDF (follow-up)	Davies Liu	2015-12-11	2	-16/+23
\| \| \| \| \| \| \| \|	This is a follow-up PR for #10259 Author: Davies Liu <davies@databricks.com> Closes #10266 from davies/null_udf2.
*	[SPARK-12258][SQL] passing null into ScalaUDF	Davies Liu	2015-12-10	2	-6/+10
\| \| \| \| \| \| \| \| \| \|	Check nullability and passing them into ScalaUDF. Closes #10249 Author: Davies Liu <davies@databricks.com> Closes #10259 from davies/udf_null.
*	[SPARK-12251] Document and improve off-heap memory configurations	Josh Rosen	2015-12-10	3	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes #10237 from JoshRosen/SPARK-12251.
*	[SPARK-12228][SQL] Try to run execution hive's derby in memory.	Yin Huai	2015-12-10	4	-5/+9
\| \| \| \| \| \| \| \| \| \|	This PR tries to make execution hive's derby run in memory since it is a fake metastore and every time we create a HiveContext, we will switch to a new one. It is possible that it can reduce the flakyness of our tests that need to create HiveContext (e.g. HiveSparkSubmitSuite). I will test it more. https://issues.apache.org/jira/browse/SPARK-12228 Author: Yin Huai <yhuai@databricks.com> Closes #10204 from yhuai/derbyInMemory.
*	[SPARK-12250][SQL] Allow users to define a UDAF without providing details of ↵	Yin Huai	2015-12-10	2	-5/+64
\| \| \| \| \| \| \| \| \| \|	its inputSchema https://issues.apache.org/jira/browse/SPARK-12250 Author: Yin Huai <yhuai@databricks.com> Closes #10236 from yhuai/SPARK-12250.
*	[SPARK-12242][SQL] Add DataFrame.transform method	Reynold Xin	2015-12-10	2	-1/+14
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10226 from rxin/df-transform.
*	[SPARK-12252][SPARK-12131][SQL] refactor MapObjects to make it less hacky	Wenchen Fan	2015-12-10	4	-47/+35
\| \| \| \| \| \| \| \| \| \| \| \|	in https://github.com/apache/spark/pull/10133 we found that, we shoud ensure the children of `TreeNode` are all accessible in the `productIterator`, or the behavior will be very confusing. In this PR, I try to fix this problem by expsing the `loopVar`. This also fixes SPARK-12131 which is caused by the hacky `MapObjects`. Author: Wenchen Fan <wenchen@databricks.com> Closes #10239 from cloud-fan/map-objects.
*	[SPARK-11796] Fix httpclient and httpcore depedency issues related to ↵	Mark Grover	2015-12-09	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	docker-client This commit fixes dependency issues which prevented the Docker-based JDBC integration tests from running in the Maven build. Author: Mark Grover <mgrover@cloudera.com> Closes #9876 from markgrover/master_docker.
*	[SPARK-12012][SQL] Show more comprehensive PhysicalRDD metadata when ↵	Cheng Lian	2015-12-09	10	-31/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	visualizing SQL query plan This PR adds a `private[sql]` method `metadata` to `SparkPlan`, which can be used to describe detail information about a physical plan during visualization. Specifically, this PR uses this method to provide details of `PhysicalRDD`s translated from a data source relation. For example, a `ParquetRelation` converted from Hive metastore table `default.psrc` is now shown as the following screenshot: ![image](https://cloud.githubusercontent.com/assets/230655/11526657/e10cb7e6-9916-11e5-9afa-f108932ec890.png) And here is the screenshot for a regular `ParquetRelation` (not converted from Hive metastore table) loaded from a really long path: ![output](https://cloud.githubusercontent.com/assets/230655/11680582/37c66460-9e94-11e5-8f50-842db5309d5a.png) Author: Cheng Lian <lian@databricks.com> Closes #10004 from liancheng/spark-12012.physical-rdd-metadata.
*	[SPARK-11676][SQL] Parquet filter tests all pass if filters are not really ↵	hyukjinkwon	2015-12-09	1	-28/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pushed down Currently Parquet predicate tests all pass even if filters are not pushed down or this is disabled. In this PR, For checking evaluating filters, Simply it makes the expression from `expression.Filter` and then try to create filters just like Spark does. For checking the results, this manually accesses to the child rdd (of `expression.Filter`) and produces the results which should be filtered properly, and then compares it to expected values. Now, if filters are not pushed down or this is disabled, this throws exceptions. Author: hyukjinkwon <gurwls223@gmail.com> Closes #9659 from HyukjinKwon/SPARK-11676.
*	[SPARK-12069][SQL] Update documentation with Datasets	Michael Armbrust	2015-12-08	2	-4/+65
\| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs.
*	[SPARK-12205][SQL] Pivot fails Analysis when aggregate is UnresolvedFunction	Andrew Ray	2015-12-08	2	-1/+9
\| \| \| \| \| \| \| \|	Delays application of ResolvePivot until all aggregates are resolved to prevent problems with UnresolvedFunction and adds unit test Author: Andrew Ray <ray.andrew@gmail.com> Closes #10202 from aray/sql-pivot-unresolved-function.
*	[SPARK-12188][SQL] Code refactoring and comment correction in Dataset APIs	gatorsmile	2015-12-08	1	-40/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR contains the following updates: - Created a new private variable `boundTEncoder` that can be shared by multiple functions, `RDD`, `select` and `collect`. - Replaced all the `queryExecution.analyzed` by the function call `logicalPlan` - A few API comments are using wrong class names (e.g., `DataFrame`) or parameter names (e.g., `n`) - A few API descriptions are wrong. (e.g., `mapPartitions`) marmbrus rxin cloud-fan Could you take a look and check if they are appropriate? Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10184 from gatorsmile/datasetClean.
*	[SPARK-12195][SQL] Adding BigDecimal, Date and Timestamp into Encoder	gatorsmile	2015-12-08	2	-0/+35
\| \| \| \| \| \| \| \| \| \|	This PR is to add three more data types into Encoder, including `BigDecimal`, `Date` and `Timestamp`. marmbrus cloud-fan rxin Could you take a quick look at these three types? Not sure if it can be merged to 1.6. Thank you very much! Author: gatorsmile <gatorsmile@gmail.com> Closes #10188 from gatorsmile/dataTypesinEncoder.
*	[SPARK-12201][SQL] add type coercion rule for greatest/least	Wenchen Fan	2015-12-08	3	-0/+47
\| \| \| \| \| \| \| \| \|	checked with hive, greatest/least should cast their children to a tightest common type, i.e. `(int, long) => long`, `(int, string) => error`, `(decimal(10,5), decimal(5, 10)) => error` Author: Wenchen Fan <wenchen@databricks.com> Closes #10196 from cloud-fan/type-coercion.
*	[SPARK-11884] Drop multiple columns in the DataFrame API	tedyu	2015-12-07	2	-8/+23
\| \| \| \| \| \| \| \| \| \| \|	See the thread Ben started: http://search-hadoop.com/m/q3RTtveEuhjsr7g/ This PR adds drop() method to DataFrame which accepts multiple column names Author: tedyu <yuzhihong@gmail.com> Closes #9862 from ted-yu/master.
*	[SPARK-12032] [SQL] Re-order inner joins to do join with conditions first	Davies Liu	2015-12-07	3	-6/+185
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the order of joins is exactly the same as SQL query, some conditions may not pushed down to the correct join, then those join will become cross product and is extremely slow. This patch try to re-order the inner joins (which are common in SQL query), pick the joins that have self-contain conditions first, delay those that does not have conditions. After this patch, the TPCDS query Q64/65 can run hundreds times faster. cc marmbrus nongli Author: Davies Liu <davies@databricks.com> Closes #10073 from davies/reorder_joins.
*	[SPARK-12138][SQL] Escape \u in the generated comments of codegen	gatorsmile	2015-12-06	2	-1/+12
\| \| \| \| \| \| \| \| \| \|	When \u appears in a comment block (i.e. in /**/), code gen will break. So, in Expression and CodegenFallback, we escape \u to \\u. yhuai Please review it. I did reproduce it and it works after the fix. Thanks! Author: gatorsmile <gatorsmile@gmail.com> Closes #10155 from gatorsmile/escapeU.
*	[SPARK-12048][SQL] Prevent to close JDBC resources twice	gcc	2015-12-06	1	-0/+1
\| \| \| \| \| \|	Author: gcc <spark-src@condor.rhaag.ip> Closes #10101 from rh99/master.
*	[SPARK-12084][CORE] Fix codes that uses ByteBuffer.array incorrectly	Shixiong Zhu	2015-12-04	4	-9/+14
\| \| \| \| \| \| \| \| \| \|	`ByteBuffer` doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by `ByteBuffer.slice`. We should not use the whole content of `ByteBuffer` unless we know that's correct. This patch fixed all places that use `ByteBuffer.array` incorrectly. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10083 from zsxwing/bytebuffer-array.
*	[SPARK-12112][BUILD] Upgrade to SBT 0.13.9	Josh Rosen	2015-12-05	5	-9/+9
\| \| \| \| \| \| \| \| \| \|	We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin). I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations. Author: Josh Rosen <joshrosen@databricks.com> Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
*	[SPARK-6990][BUILD] Add Java linting script; fix minor warnings	Dmitry Erastov	2015-12-04	4	-29/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces https://github.com/apache/spark/pull/9696 Invoke Checkstyle and print any errors to the console, failing the step. Use Google's style rules modified according to https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide Some important checks are disabled (see TODOs in `checkstyle.xml`) due to multiple violations being present in the codebase. Suggest fixing those TODOs in a separate PR(s). More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/). Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles): > Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause. > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions. > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause. > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions. > [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1 Also fix some of the minor violations that didn't require sweeping changes. Apologies for the previous botched PRs - I finally figured out the issue. cr: JoshRosen, pwendell > I state that the contribution is my original work, and I license the work to the project under the project's open source license. Author: Dmitry Erastov <derastov@gmail.com> Closes #9867 from dskrvk/master.
*	[SPARK-11206] Support SQL UI on the history server (resubmit)	Carson Wang	2015-12-03	13	-129/+271
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resubmit #9297 and #9991 On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an onOtherEvent method to the SparkListener trait and post all SQL related events to the same event bus. 2. Two SQL events SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait SparkHistoryListenerFactory is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using java.util.ServiceLoader. Author: Carson Wang <carson.wang@intel.com> Closes #10061 from carsonwang/SqlHistoryUI.
*	[SPARK-12088][SQL] check connection.isClosed before calling connection…	Huaxin Gao	2015-12-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	In Java Spec java.sql.Connection, it has boolean getAutoCommit() throws SQLException Throws: SQLException - if a database access error occurs or this method is called on a closed connection So if conn.getAutoCommit is called on a closed connection, a SQLException will be thrown. Even though the code catch the SQLException and program can continue, I think we should check conn.isClosed before calling conn.getAutoCommit to avoid the unnecessary SQLException. Author: Huaxin Gao <huaxing@oc0558782468.ibm.com> Closes #10095 from huaxingao/spark-12088.
*	[SPARK-12109][SQL] Expressions's simpleString should delegate to its toString.	Yin Huai	2015-12-03	3	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12109 The change of https://issues.apache.org/jira/browse/SPARK-11596 exposed the problem. In the sql plan viz, the filter shows ![image](https://cloud.githubusercontent.com/assets/2072857/11547075/1a285230-9906-11e5-8481-2bb451e35ef1.png) After changes in this PR, the viz is back to normal. ![image](https://cloud.githubusercontent.com/assets/2072857/11547080/2bc570f4-9906-11e5-8897-3b3bff173276.png) Author: Yin Huai <yhuai@databricks.com> Closes #10111 from yhuai/SPARK-12109.