spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SPARK-12231][SQL] create a combineFilters' projection when we call ↵	Kevin Yu	2015-12-28	2	-5/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	buildPartitionedTableScan Hello Michael & All: We have some issues to submit the new codes in the other PR(#10299), so we closed that PR and open this one with the fix. The reason for the previous failure is that the projection for the scan when there is a filter that is not pushed down (the "left-over" filter) could be different, in elements or ordering, from the original projection. With this new codes, the approach to solve this problem is: Insert a new Project if the "left-over" filter is nonempty and (the original projection is not empty and the projection for the scan has more than one elements which could otherwise cause different ordering in projection). We create 3 test cases to cover the otherwise failure cases. Author: Kevin Yu <qyu@us.ibm.com> Closes #10388 from kevinyu98/spark-12231.
*	[HOT-FIX] bypass hive test when parse logical plan to json	Wenchen Fan	2015-12-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	https://github.com/apache/spark/pull/10311 introduces some rare, non-deterministic flakiness for hive udf tests, see https://github.com/apache/spark/pull/10311#issuecomment-166548851 I can't reproduce it locally, and may need more time to investigate, a quick solution is: bypass hive tests for json serialization. Author: Wenchen Fan <wenchen@databricks.com> Closes #10430 from cloud-fan/hot-fix.
*	[SPARK-12218] Fixes ORC conjunction predicate push down	Cheng Lian	2015-12-28	3	-30/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR is a follow-up of PR #10362. Two major changes: 1. The fix introduced in #10362 is OK for Parquet, but may disable ORC PPD in many cases PR #10362 stops converting an `AND` predicate if any branch is inconvertible. On the other hand, `OrcFilters` combines all filters into a single big conjunction first and then tries to convert it into ORC `SearchArgument`. This means, if any filter is inconvertible, no filters can be pushed down. This PR fixes this issue by finding out all convertible filters first before doing the actual conversion. The reason behind the current implementation is mostly due to the limitation of ORC `SearchArgument` builder, which is documented in this PR in detail. 1. Copied the `AND` predicate fix for ORC from #10362 to avoid merge conflict. Same as #10362, this PR targets master (2.0.0-SNAPSHOT), branch-1.6, and branch-1.5. Author: Cheng Lian <lian@databricks.com> Closes #10377 from liancheng/spark-12218.fix-orc-conjunction-ppd.
*	[SPARK-12515][SQL][DOC] minor doc update for read.jdbc	felixcheung	2015-12-28	1	-5/+6
\| \| \| \| \| \|	Author: felixcheung <felixcheung_m@hotmail.com> Closes #10465 from felixcheung/dfreaderjdbcdoc.
*	[SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT ↵	CK50	2015-12-24	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	syntax In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()): INSERT INTO $table VALUES ( ?, ?, ..., ? ) But some technologies require a list of column names: INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? ) This was blocking the use of e.g. the Progress JDBC Driver for Cassandra. Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc(). If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types. This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names. Author: CK50 <christian.kurz@oracle.com> Closes #10380 from CK50/master-SPARK-12010-2.
*	[SPARK-12477][SQL] - Tungsten projection fails for null values in array fields	pierre-borckmans	2015-12-22	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Accessing null elements in an array field fails when tungsten is enabled. It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled. This PR solves this by checking if the accessed element in the array field is null, in the generated code. Example: ``` // Array of String case class AS( as: Seq[String] ) val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF dfAS.registerTempTable("T_AS") for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))} ``` With Tungsten disabled: ``` 0 = [a] 1 = [null] 2 = [b] ``` With Tungsten enabled: ``` 0 = [a] 15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ``` Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com> Closes #10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
*	[SPARK-11164][SQL] Add InSet pushdown filter back for Parquet	Liang-Chi Hsieh	2015-12-23	3	-8/+45
\| \| \| \| \| \| \| \| \| \|	When the filter is ```"b in ('1', '2')"```, the filter is not pushed down to Parquet. Thanks! Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10278 from gatorsmile/parquetFilterNot.
*	[SPARK-12478][SQL] Bugfix: Dataset fields of product types can't be null	Cheng Lian	2015-12-23	2	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \|	When creating extractors for product types (i.e. case classes and tuples), a null check is missing, thus we always assume input product values are non-null. This PR adds a null check in the extractor expression for product types. The null check is stripped off for top level product fields, which are mapped to the outermost `Row`s, since they can't be null. Thanks cloud-fan for helping investigating this issue! Author: Cheng Lian <lian@databricks.com> Closes #10431 from liancheng/spark-12478.top-level-null-field.
*	[SPARK-12102][SQL] Cast a non-nullable struct field to a nullable field ↵	Dilip Biswal	2015-12-22	2	-1/+9
\| \| \| \| \| \| \| \| \| \|	during analysis Compare both left and right side of the case expression ignoring nullablity when checking for type equality. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #10156 from dilipbiswal/spark-12102.
*	[SPARK-12471][CORE] Spark daemons will log their pid on start up.	Nong Li	2015-12-22	1	-0/+1
\| \| \| \| \| \|	Author: Nong Li <nong@databricks.com> Closes #10422 from nongli/12471-pids.
*	[SPARK-12456][SQL] Add ExpressionDescription to misc functions	Xiu Guo	2015-12-22	4	-0/+29
\| \| \| \| \| \| \| \|	First try, not sure how much information we need to provide in the usage part. Author: Xiu Guo <xguo27@gmail.com> Closes #10423 from xguo27/SPARK-12456.
*	[SPARK-11677][SQL][FOLLOW-UP] Add tests for checking the ORC filter creation ↵	hyukjinkwon	2015-12-23	1	-0/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	against pushed down filters. https://issues.apache.org/jira/browse/SPARK-11677 Although it checks correctly the filters by the number of results if ORC filter-push-down is enabled, the filters themselves are not being tested. So, this PR includes the test similarly with `ParquetFilterSuite`. Since the results are checked by `OrcQuerySuite`, this `OrcFilterSuite` only checks if the appropriate filters are created. One thing different with `ParquetFilterSuite` here is, it does not check the results because that is checked in `OrcQuerySuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10341 from HyukjinKwon/SPARK-11677-followup.
*	[SPARK-12371][SQL] Runtime nullability check for NewInstance	Cheng Lian	2015-12-22	7	-10/+232
\| \| \| \| \| \| \| \|	This PR adds a new expression `AssertNotNull` to ensure non-nullable fields of products and case classes don't receive null values at runtime. Author: Cheng Lian <lian@databricks.com> Closes #10331 from liancheng/dataset-nullability-check.
*	[SPARK-12446][SQL] Add unit tests for JDBCRDD internal functions	Takeshi YAMAMURO	2015-12-22	2	-33/+54
\| \| \| \| \| \| \| \|	No tests done for JDBCRDD#compileFilter. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10409 from maropu/AddTestsInJdbcRdd.
*	[SPARK-11823][SQL] Fix flaky JDBC cancellation test in ↵	Josh Rosen	2015-12-21	1	-29/+56
\| \| \| \| \| \| \| \| \| \| \| \|	HiveThriftBinaryServerSuite This patch fixes a flaky "test jdbc cancel" test in HiveThriftBinaryServerSuite. This test is prone to a race-condition which causes it to block indefinitely with while waiting for an extremely slow query to complete, which caused many Jenkins builds to time out. For more background, see my comments on #6207 (the PR which introduced this test). Author: Josh Rosen <joshrosen@databricks.com> Closes #10425 from JoshRosen/SPARK-11823.
*	[SPARK-11807] Remove support for Hadoop < 2.2	Reynold Xin	2015-12-21	1	-1/+1
\| \| \| \| \| \| \| \|	i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
*	[SPARK-12388] change default compression to lz4	Davies Liu	2015-12-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.
*	[SPARK-12339][SPARK-11206][WEBUI] Added a null check that was removed in	Alex Bozarth	2015-12-21	1	-6/+8
\| \| \| \| \| \| \| \|	Updates made in SPARK-11206 missed an edge case which cause's a NullPointerException when a task is killed. In some cases when a task ends in failure taskMetrics is initialized as null (see JobProgressListener.onTaskEnd()). To address this a null check was added. Before the changes in SPARK-11206 this null check was called at the start of the updateTaskAccumulatorValues() function. Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #10405 from ajbozarth/spark12339.
*	[SPARK-12374][SPARK-12150][SQL] Adding logical/physical operators for Range	gatorsmile	2015-12-21	6	-7/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance. Also added another API for resolving the JIRA Spark-12150. Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : ) Thank you very much! Author: gatorsmile <gatorsmile@gmail.com> Closes #10335 from gatorsmile/rangeOperators.
*	[SPARK-12321][SQL] JSON format for TreeNode (use reflection)	Wenchen Fan	2015-12-21	13	-75/+472
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An alternative solution for https://github.com/apache/spark/pull/10295 , instead of implementing json format for all logical/physical plans and expressions, use reflection to implement it in `TreeNode`. Here I use pre-order traversal to flattern a plan tree to a plan list, and add an extra field `num-children` to each plan node, so that we can reconstruct the tree from the list. example json: logical plan tree: ``` [ { "class" : "org.apache.spark.sql.catalyst.plans.logical.Sort", "num-children" : 1, "order" : [ [ { "class" : "org.apache.spark.sql.catalyst.expressions.SortOrder", "num-children" : 1, "child" : 0, "direction" : "Ascending" }, { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "i", "dataType" : "integer", "nullable" : true, "metadata" : { }, "exprId" : { "id" : 10, "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6" }, "qualifiers" : [ ] } ] ], "global" : false, "child" : 0 }, { "class" : "org.apache.spark.sql.catalyst.plans.logical.Project", "num-children" : 1, "projectList" : [ [ { "class" : "org.apache.spark.sql.catalyst.expressions.Alias", "num-children" : 1, "child" : 0, "name" : "i", "exprId" : { "id" : 10, "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6" }, "qualifiers" : [ ] }, { "class" : "org.apache.spark.sql.catalyst.expressions.Add", "num-children" : 2, "left" : 0, "right" : 1 }, { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "a", "dataType" : "integer", "nullable" : true, "metadata" : { }, "exprId" : { "id" : 0, "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6" }, "qualifiers" : [ ] }, { "class" : "org.apache.spark.sql.catalyst.expressions.Literal", "num-children" : 0, "value" : "1", "dataType" : "integer" } ], [ { "class" : "org.apache.spark.sql.catalyst.expressions.Alias", "num-children" : 1, "child" : 0, "name" : "j", "exprId" : { "id" : 11, "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6" }, "qualifiers" : [ ] }, { "class" : "org.apache.spark.sql.catalyst.expressions.Multiply", "num-children" : 2, "left" : 0, "right" : 1 }, { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "a", "dataType" : "integer", "nullable" : true, "metadata" : { }, "exprId" : { "id" : 0, "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6" }, "qualifiers" : [ ] }, { "class" : "org.apache.spark.sql.catalyst.expressions.Literal", "num-children" : 0, "value" : "2", "dataType" : "integer" } ] ], "child" : 0 }, { "class" : "org.apache.spark.sql.catalyst.plans.logical.LocalRelation", "num-children" : 0, "output" : [ [ { "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference", "num-children" : 0, "name" : "a", "dataType" : "integer", "nullable" : true, "metadata" : { }, "exprId" : { "id" : 0, "jvmId" : "cd1313c7-3f66-4ed7-a320-7d91e4633ac6" }, "qualifiers" : [ ] } ] ], "data" : [ ] } ] ``` Author: Wenchen Fan <wenchen@databricks.com> Closes #10311 from cloud-fan/toJson-reflection.
*	[SPARK-12398] Smart truncation of DataFrame / Dataset toString	Dilip Biswal	2015-12-21	4	-1/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a DataFrame or Dataset has a long schema, we should intelligently truncate to avoid flooding the screen with unreadable information. // Standard output [a: int, b: int] // Truncate many top level fields [a: int, b, string ... 10 more fields] // Truncate long inner structs [a: struct<a: Int ... 10 more fields>] Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #10373 from dilipbiswal/spark-12398.
*	Bump master version to 2.0.0-SNAPSHOT.	Reynold Xin	2015-12-19	4	-4/+4
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
*	[SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable	Kousuke Saruta	2015-12-18	6	-26/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be serialized but sometimes the object passed is not serializable. For example, following code raises Exception because `RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`. ``` case class TimestampContainer(timestamp: java.sql.Timestamp) val rdd = sc.parallelize(1 to 2).map(_ => TimestampContainer(System.currentTimeMillis)) val df = rdd.toDF val ds = df.as[TimestampContainer] val rdd2 = ds.rdd <----------------- invokes extractorsFor indirectory ``` I'll add test cases. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Author: Michael Armbrust <michael@databricks.com> Closes #10357 from sarutak/SPARK-12404.
*	[SPARK-12218][SQL] Invalid splitting of nested AND expressions in Data ↵	Yin Huai	2015-12-18	4	-13/+60
\| \| \| \| \| \| \| \| \| \| \| \|	Source filter API JIRA: https://issues.apache.org/jira/browse/SPARK-12218 When creating filters for Parquet/ORC, we should not push nested AND expressions partially. Author: Yin Huai <yhuai@databricks.com> Closes #10362 from yhuai/SPARK-12218.
*	[SPARK-12054] [SQL] Consider nullability of expression in codegen	Davies Liu	2015-12-18	27	-226/+261
\| \| \| \| \| \| \| \| \| \|	This could simplify the generated code for expressions that is not nullable. This PR fix lots of bugs about nullability. Author: Davies Liu <davies@databricks.com> Closes #10333 from davies/skip_nullable.
*	[SPARK-11619][SQL] cannot use UDTF in DataFrame.selectExpr	Dilip Biswal	2015-12-18	7	-14/+31
\| \| \| \| \| \| \| \| \| \| \| \|	Description of the problem from cloud-fan Actually this line: https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L689 When we use `selectExpr`, we pass in `UnresolvedFunction` to `DataFrame.select` and fall in the last case. A workaround is to do special handling for UDTF like we did for `explode`(and `json_tuple` in 1.6), wrap it with `MultiAlias`. Another workaround is using `expr`, for example, `df.select(expr("explode(a)").as(Nil))`, I think `selectExpr` is no longer needed after we have the `expr` function.... Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #9981 from dilipbiswal/spark-11619.
*	[MINOR] Hide the error logs for 'SQLListenerMemoryLeakSuite'	Shixiong Zhu	2015-12-17	1	-29/+35
\| \| \| \| \| \| \| \|	Hide the error logs for 'SQLListenerMemoryLeakSuite' to avoid noises. Most of changes are space changes. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10363 from zsxwing/hide-log.
*	[SPARK-8641][SQL] Native Spark Window functions	Herman van Hovell	2015-12-17	15	-746/+1148
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR removes Hive windows functions from Spark and replaces them with (native) Spark ones. The PR is on par with Hive in terms of features. This has the following advantages: * Better memory management. * The ability to use spark UDAFs in Window functions. cc rxin / yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #9819 from hvanhovell/SPARK-8641-2.
*	[SPARK-12397][SQL] Improve error messages for data sources when they are not ↵	Reynold Xin	2015-12-17	2	-18/+49
\| \| \| \| \| \| \| \| \| \|	found Point users to spark-packages.org to find them. Author: Reynold Xin <rxin@databricks.com> Closes #10351 from rxin/SPARK-12397.
*	[SQL] Update SQLContext.read.text doc	Yanbo Liang	2015-12-17	2	-2/+2
\| \| \| \| \| \| \| \|	Since we rename the column name from ```text``` to ```value``` for DataFrame load by ```SQLContext.read.text```, we need to update doc. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10349 from yanboliang/text-value.
*	[SPARK-12395] [SQL] fix resulting columns of outer join	Davies Liu	2015-12-17	2	-9/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null). The order of columns had been changed to match that with MySQL and PostgreSQL [1]. This PR also fix the nullability of output for outer join. [1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html Author: Davies Liu <davies@databricks.com> Closes #10353 from davies/fix_join.
*	[SPARK-12057][SQL] Prevent failure on corrupt JSON records	Yin Huai	2015-12-16	4	-12/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference. Regarding the schema inference change, if we have something like ``` {"f1":1} [1,2,3] ``` originally, we will get a DF without any column. After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`. When merge this PR, please make sure that the author is simplyianm. JIRA: https://issues.apache.org/jira/browse/SPARK-12057 Closes #10043 Author: Ian Macalinao <me@ian.pw> Author: Yin Huai <yhuai@databricks.com> Closes #10288 from yhuai/handleCorruptJson.
*	[SPARK-12365][CORE] Use ShutdownHookManager where ↵	tedyu	2015-12-16	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Runtime.getRuntime.addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <yuzhihong@gmail.com> Closes #10325 from ted-yu/master.
*	[SPARK-11677][SQL] ORC filter tests all pass if filters are actually not ↵	hyukjinkwon	2015-12-16	1	-17/+36
\| \| \| \| \| \| \| \| \| \| \|	pushed down. Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this. Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values. Author: hyukjinkwon <gurwls223@gmail.com> Closes #9687 from HyukjinKwon/SPARK-11677.
*	[SPARK-12164][SQL] Decode the encoded values and then display	gatorsmile	2015-12-16	5	-48/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show` ```scala implicit val kryoEncoder = Encoders.kryo[KryoClassData] val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS() ds.show(20, false); ``` The current output is like ``` +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \|value \| +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]\| \|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]\| \|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]\| +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData` ```scala override def toString: String = s"KryoClassData($a, $b)" ``` ``` +-------------------+ \|value \| +-------------------+ \|KryoClassData(a, 1)\| \|KryoClassData(b, 2)\| \|KryoClassData(c, 3)\| +-------------------+ ``` If users do not override the `toString` function, the results will be like ``` +---------------------------------------+ \|value \| +---------------------------------------+ \|org.apache.spark.sql.KryoClassData68ef\| \|org.apache.spark.sql.KryoClassData6915\| \|org.apache.spark.sql.KryoClassData693b\| +---------------------------------------+ ``` Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values? Author: gatorsmile <gatorsmile@gmail.com> Closes #10215 from gatorsmile/showDecodedValue.
*	[SPARK-12320][SQL] throw exception if the number of fields does not line up ↵	Wenchen Fan	2015-12-16	5	-18/+93
\| \| \| \| \| \| \| \|	for Tuple encoder Author: Wenchen Fan <wenchen@databricks.com> Closes #10293 from cloud-fan/err-msg.
*	[SPARK-8745] [SQL] remove GenerateProjection	Davies Liu	2015-12-16	8	-319/+11
\| \| \| \| \| \| \| \|	cc rxin Author: Davies Liu <davies@databricks.com> Closes #10316 from davies/remove_generate_projection.
*	Revert "[SPARK-12105] [SQL] add convenient show functions"	Reynold Xin	2015-12-16	1	-16/+9
\| \| \| \|	This reverts commit 31b391019ff6eb5a483f4b3e62fd082de7ff8416.
*	Revert "[HOTFIX] Compile error from commit 31b3910"	Reynold Xin	2015-12-16	1	-1/+1
\| \| \| \|	This reverts commit 840bd2e008da5b22bfa73c587ea2c57666fffc60.
*	Style fix for the previous 3 JDBC filter push down commits.	Reynold Xin	2015-12-15	1	-9/+8
\|
*	[SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource.	hyukjinkwon	2015-12-15	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12315 `IsNotNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string. Author: hyukjinkwon <gurwls223@gmail.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10287 from HyukjinKwon/SPARK-12315.
*	[SPARK-12314][SQL] isnull operator not pushed down for JDBC datasource.	hyukjinkwon	2015-12-15	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12314 `IsNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNull` filter to produce a proper filter string. Author: hyukjinkwon <gurwls223@gmail.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10286 from HyukjinKwon/SPARK-12314.
*	[SPARK-12249][SQL] JDBC non-equality comparison operator not pushed down.	hyukjinkwon	2015-12-15	2	-0/+3
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12249 Currently `!=` operator is not pushed down correctly. I simply added a case for this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10233 from HyukjinKwon/SPARK-12249.
*	[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability	Wenchen Fan	2015-12-15	2	-21/+27
\| \| \| \| \| \|	Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8645 from cloud-fan/test.
*	[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling ↵	tedyu	2015-12-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu <yuzhihong@gmail.com> Closes #10164 from tedyu/master.
*	[HOTFIX] Compile error from commit 31b3910	Andrew Or	2015-12-15	1	-1/+1
\|
*	[SPARK-12105] [SQL] add convenient show functions	Jean-Baptiste Onofré	2015-12-15	1	-9/+16
\| \| \| \| \| \|	Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10130 from jbonofre/SPARK-12105.
*	[SPARK-12236][SQL] JDBC filter tests all pass if filters are not really ↵	hyukjinkwon	2015-12-15	3	-21/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pushed down https://issues.apache.org/jira/browse/SPARK-12236 Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering. In this PR, Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering. Also, `!=` was being tested which is actually not pushed down. So I removed them. Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10221 from HyukjinKwon/SPARK-12236.
*	[SPARK-12271][SQL] Improve error message when Dataset.as[ ] has incompatible ↵	Nong Li	2015-12-15	4	-7/+18
\| \| \| \| \| \| \| \|	schemas. Author: Nong Li <nong@databricks.com> Closes #10260 from nongli/spark-11271.
*	[SPARK-12288] [SQL] Support UnsafeRow in Coalesce/Except/Intersect.	gatorsmile	2015-12-14	2	-1/+46
\| \| \| \| \| \| \| \| \| \|	Support UnsafeRow for the Coalesce/Except/Intersect. Could you review if my code changes are ok? davies Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10285 from gatorsmile/unsafeSupportCIE.