spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-12504][SQL] Masking credentials in the sql plan explain output for ↵	sureshthalamati	2016-01-05	2	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \|	JDBC data sources. This fix masks JDBC credentials in the explain output. URL patterns to specify credential seems to be vary between different databases. Added a new method to dialect to mask the credentials according to the database specific URL pattern. While adding tests I noticed explain output includes array variable for partitions ([Lorg.apache.spark.Partition;3ff74546,). Modified the code to include the first, and last partition information. Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #10452 from sureshthalamati/mask_jdbc_credentials_spark-12504.
*	[SPARK-3873][SQL] Import ordering fixes.	Marcelo Vanzin	2016-01-05	164	-318/+301
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10573 from vanzin/SPARK-3873-sql.
*	[SPARK-12636] [SQL] Update UnsafeRowParquetRecordReader to support reading ↵	Nong	2016-01-05	3	-29/+178
\| \| \| \| \| \| \| \| \| \|	files directly. As noted in the code, this change is to make this component easier to test in isolation. Author: Nong <nongli@gmail.com> Closes #10581 from nongli/spark-12636.
*	[SPARK-12439][SQL] Fix toCatalystArray and MapObjects	Liang-Chi Hsieh	2016-01-05	4	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-12439 In toCatalystArray, we should look at the data type returned by dataTypeFor instead of silentSchemaFor, to determine if the element is native type. An obvious problem is when the element is Option[Int] class, catalsilentSchemaFor will return Int, then we will wrongly recognize the element is native type. There is another problem when using Option as array element. When we encode data like Seq(Some(1), Some(2), None) with encoder, we will use MapObjects to construct an array for it later. But in MapObjects, we don't check if the return value of lambdaFunction is null or not. That causes a bug that the decoded data for Seq(Some(1), Some(2), None) would be Seq(1, 2, -1), instead of Seq(1, 2, null). Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10391 from viirya/fix-catalystarray.
*	[SPARK-12615] Remove some deprecated APIs in RDD/SparkContext	Reynold Xin	2016-01-05	1	-1/+1
\| \| \| \| \| \| \| \|	I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
*	[SPARK-12480][FOLLOW-UP] use a single column vararg for hash	Wenchen Fan	2016-01-05	3	-3/+4
\| \| \| \| \| \| \| \| \| \|	address comments in #10435 This makes the API easier to use if user programmatically generate the call to hash, and they will get analysis exception if the arguments of hash is empty. Author: Wenchen Fan <wenchen@databricks.com> Closes #10588 from cloud-fan/hash.
*	[SPARK-12438][SQL] Add SQLUserDefinedType support for encoder	Liang-Chi Hsieh	2016-01-05	3	-0/+38
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-12438 ScalaReflection lacks the support of SQLUserDefinedType. We should add it. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10390 from viirya/encoder-udt.
*	[SPARK-12568][SQL] Add BINARY to Encoders	Michael Armbrust	2016-01-04	3	-3/+18
\| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #10516 from marmbrus/datasetCleanup.
*	[SPARK-12600][SQL] follow up: add range check for DecimalType	Reynold Xin	2016-01-04	1	-0/+10
\| \| \| \| \| \| \| \|	This addresses davies' code review feedback in https://github.com/apache/spark/pull/10559 Author: Reynold Xin <rxin@databricks.com> Closes #10586 from rxin/remove-deprecated-sql-followup.
*	[SPARK-12480][SQL] add Hash expression that can calculate hash value for a ↵	Wenchen Fan	2016-01-04	10	-6/+171
\| \| \| \| \| \| \| \| \| \|	group of expressions just write the arguments into unsafe row and use murmur3 to calculate hash code Author: Wenchen Fan <wenchen@databricks.com> Closes #10435 from cloud-fan/hash-expr.
*	[SPARK-12600][SQL] Remove deprecated methods in Spark SQL	Reynold Xin	2016-01-04	18	-1062/+123
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10559 from rxin/remove-deprecated-sql.
*	[SPARK-12509][SQL] Fixed error messages for DataFrame correlation and covariance	Narine Kokhlikyan	2016-01-04	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when we call corr or cov on dataframe with invalid input we see these error messages for both corr and cov: - "Currently cov supports calculating the covariance between two columns" - "Covariance calculation for columns with dataType "[DataType Name]" not supported." I've fixed this issue by passing the function name as an argument. We could also do the input checks separately for each function. I avoided doing that because of code duplication. Thanks! Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #10458 from NarineK/sparksqlstatsmessages.
*	[SPARK-12589][SQL] Fix UnsafeRowParquetRecordReader to properly set the row ↵	Nong Li	2016-01-04	3	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \|	length. The reader was previously not setting the row length meaning it was wrong if there were variable length columns. This problem does not manifest usually, since the value in the column is correct and projecting the row fixes the issue. Author: Nong Li <nong@databricks.com> Closes #10576 from nongli/spark-12589.
*	[SPARK-12541] [SQL] support cube/rollup as function	Davies Liu	2016-01-04	8	-48/+87
\| \| \| \| \| \| \| \| \| \| \|	This PR enable cube/rollup as function, so they can be used as this: ``` select a, b, sum(c) from t group by rollup(a, b) ``` Author: Davies Liu <davies@databricks.com> Closes #10522 from davies/rollup.
*	[SPARK-12421][SQL] Prevent Internal/External row from exposing state.	Herman van Hovell	2016-01-04	2	-4/+34
\| \| \| \| \| \| \| \| \| \| \| \|	It is currently possible to change the values of the supposedly immutable ```GenericRow``` and ```GenericInternalRow``` classes. This is caused by the fact that scala's ArrayOps ```toArray``` (returned by calling ```toSeq```) will return the backing array instead of a copy. This PR fixes this problem. This PR was inspired by https://github.com/apache/spark/pull/10374 by apo1. cc apo1 sarutak marmbrus cloud-fan nongli (everyone in the previous conversation). Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10553 from hvanhovell/SPARK-12421.
*	[DOC] Adjust coverage for partitionBy()	tedyu	2016-01-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02&subj=Re+partitioning+json+data+in+spark Michael suggested fixing the doc. Please review. Author: tedyu <yuzhihong@gmail.com> Closes #10499 from ted-yu/master.
*	[SPARK-12512][SQL] support column name with dot in withColumn()	Xiu Guo	2016-01-04	2	-12/+27
\| \| \| \| \| \|	Author: Xiu Guo <xguo27@gmail.com> Closes #10500 from xguo27/SPARK-12512.
*	[SPARK-12470] [SQL] Fix size reduction calculation	Pete Robbins	2016-01-04	1	-4/+4
\| \| \| \| \| \| \| \|	also only allocate required buffer size Author: Pete Robbins <robbinspg@gmail.com> Closes #10421 from robbinspg/master.
*	[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence	Josh Rosen	2016-01-04	6	-47/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection. In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection. This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly). If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different). This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons). Author: Josh Rosen <joshrosen@databricks.com> Closes #10519 from JoshRosen/jdbc-driver-precedence.
*	[SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to ↵	Xiu Guo	2016-01-03	2	-6/+7
\| \| \| \| \| \| \| \|	be called value Author: Xiu Guo <xguo27@gmail.com> Closes #10515 from xguo27/SPARK-12562.
*	[SPARK-12537][SQL] Add option to accept quoting of all character backslash ↵	Cazen	2016-01-03	3	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \|	quoting mechanism We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not. Author: Cazen <Cazen@korea.com> Author: Cazen Lee <cazen.lee@samsung.com> Author: Cazen Lee <Cazen@korea.com> Author: cazen.lee <cazen.lee@samsung.com> Closes #10497 from Cazen/master.
*	[SPARK-12533][SQL] hiveContext.table() throws the wrong exception	thomastechs	2016-01-03	2	-4/+4
\| \| \| \| \| \| \| \|	Avoiding the the No such table exception and throwing analysis exception as per the bug: SPARK-12533 Author: thomastechs <thomas.sebastian@tcs.com> Closes #10529 from thomastechs/topic-branch.
*	Revert "Revert "[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] ↵	Reynold Xin	2016-01-02	34	-574/+74
\| \| \| \| \| \|	always output UnsafeRow"" This reverts commit 44ee920fd49d35b421ae562ea99bcc8f2b98ced6.
*	[SPARK-12599][MLLIB][SQL] Remove the use of callUDF in MLlib	Reynold Xin	2016-01-02	1	-0/+14
\| \| \| \| \| \| \| \|	callUDF has been deprecated. However, we do not have an alternative for users to specify the output data type without type tags. This pull request introduced a new API for that, and replaces the invocation of the deprecated callUDF with that. Author: Reynold Xin <rxin@databricks.com> Closes #10547 from rxin/SPARK-12599.
*	[SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵	Sean Owen	2016-01-02	15	-156/+60
\| \| \| \| \| \| \| \| \| \|	and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
*	[SPARK-10180][SQL] JDBC datasource are not processing EqualNullSafe filter	hyukjinkwon	2016-01-02	2	-2/+7
\| \| \| \| \| \| \| \| \| \|	This PR is followed by https://github.com/apache/spark/pull/8391. Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison. Author: hyukjinkwon <gurwls223@gmail.com> Author: HyukjinKwon <gurwls223@gmail.com> Closes #8743 from HyukjinKwon/SPARK-10180.
*	[SPARK-12362][SQL][WIP] Inline Hive Parser	Herman van Hovell	2016-01-01	15	-71/+5392
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR inlines the Hive SQL parser in Spark SQL. The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase. This PR is a WIP and should not be merged until we have sorted out the build issues. Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Nong Li <nong@databricks.com> Author: Nong Li <nongli@gmail.com> Closes #10525 from hvanhovell/SPARK-12362.
*	Revert "[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always ↵	Reynold Xin	2016-01-01	34	-74/+574
\| \| \| \| \| \|	output UnsafeRow" This reverts commit 0da7bd50ddf0fb9e0e8aeadb9c7fb3edf6f0ee6e.
*	[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always output ↵	Davies Liu	2016-01-01	34	-574/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UnsafeRow It's confusing that some operator output UnsafeRow but some not, easy to make mistake. This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them. Closes #10330 cc JoshRosen rxin Author: Davies Liu <davies@databricks.com> Closes #10511 from davies/unsafe_row.
*	[SPARK-12592][SQL][TEST] Don't mute Spark loggers in TestHive.reset()	Cheng Lian	2016-01-01	1	-1/+4
\| \| \| \| \| \| \| \|	There's a hack done in `TestHive.reset()`, which intended to mute noisy Hive loggers. However, Spark testing loggers are also muted. Author: Cheng Lian <lian@databricks.com> Closes #10540 from liancheng/spark-12592.dont-mute-spark-loggers.
*	[SPARK-12409][SPARK-12387][SPARK-12391][SQL] Refactor filter pushdown for ↵	Liang-Chi Hsieh	2016-01-01	2	-31/+45
\| \| \| \| \| \| \| \| \| \| \| \|	JDBCRDD and add few filters This patch refactors the filter pushdown for JDBCRDD and also adds few filters. Added filters are basically from #10468 with some refactoring. Test cases are from #10468. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10470 from viirya/refactor-jdbc-filter.
*	[SPARK-11743][SQL] Move the test for arrayOfUDT	Liang-Chi Hsieh	2015-12-31	1	-13/+2
\| \| \| \| \| \| \| \|	A following pr for #9712. Move the test for arrayOfUDT. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #10538 from viirya/move-udt-test.
*	[SPARK-12039][SQL] Re-enable HiveSparkSubmitSuite's SPARK-9757 Persist ↵	Yin Huai	2015-12-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Parquet relation with decimal column https://issues.apache.org/jira/browse/SPARK-12039 since we do not support hadoop1, we can re-enable this test in master. Author: Yin Huai <yhuai@databricks.com> Closes #10533 from yhuai/SPARK-12039-enable.
*	[SPARK-12585] [SQL] move numFields to constructor of UnsafeRow	Davies Liu	2015-12-30	16	-137/+86
\| \| \| \| \| \| \| \| \| \|	Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes is calculated, making pointTo() a little bit heavy. It should be part of constructor of UnsafeRow. Author: Davies Liu <davies@databricks.com> Closes #10528 from davies/numFields.
*	[SPARK-8641][SPARK-12455][SQL] Native Spark Window functions - Follow-up ↵	Herman van Hovell	2015-12-30	3	-3/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(docs & tests) This PR is a follow-up for PR https://github.com/apache/spark/pull/9819. It adds documentation for the window functions and a couple of NULL tests. The documentation was largely based on the documentation in (the source of) Hive and Presto: * https://prestodb.io/docs/current/functions/window.html * https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics I am not sure if we need to add the licenses of these two projects to the licenses directory. They are both under the ASL. srowen any thoughts? cc yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10402 from hvanhovell/SPARK-8641-docs.
*	[SPARK-12409][SPARK-12387][SPARK-12391][SQL] Support AND/OR/IN/LIKE ↵	Takeshi YAMAMURO	2015-12-30	2	-2/+35
\| \| \| \| \| \| \| \| \| \|	push-down filters for JDBC This is rework from #10386 and add more tests and LIKE push-down support. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10468 from maropu/SupportMorePushdownInJdbc.
*	[SPARK-12495][SQL] use true as default value for propagateNull in NewInstance	Wenchen Fan	2015-12-30	7	-37/+38
\| \| \| \| \| \| \| \| \| \|	Most of cases we should propagate null when call `NewInstance`, and so far there is only one case we should stop null propagation: create product/java bean. So I think it makes more sense to propagate null by dafault. This also fixes a bug when encode null array/map, which is firstly discovered in https://github.com/apache/spark/pull/10401 Author: Wenchen Fan <wenchen@databricks.com> Closes #10443 from cloud-fan/encoder.
*	Revert "[SPARK-12362][SQL][WIP] Inline Hive Parser"	Reynold Xin	2015-12-30	15	-5392/+71
\| \| \| \|	This reverts commit b600bccf41a7b1958e33d8301a19214e6517e388 due to non-deterministic build breaks.
*	[SPARK-12564][SQL] Improve missing column AnalysisException	gatorsmile	2015-12-29	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	``` org.apache.spark.sql.AnalysisException: cannot resolve 'value' given input columns text; ``` lets put a `:` after `columns` and put the columns in `[]` so that they match the toString of DataFrame. Author: gatorsmile <gatorsmile@gmail.com> Closes #10518 from gatorsmile/improveAnalysisExceptionMsg.
*	[SPARK-12362][SQL][WIP] Inline Hive Parser	Nong Li	2015-12-29	15	-71/+5392
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark. I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do: - [ ] Get it to pass jenkins build/test. - [ ] Aknowledge Hive-project for using their parser. - [ ] Refactorings between HiveQl and the java classes. - [ ] Create our own ASTNode and integrate the current implicit extentions. - [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```. - [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files. - [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```. - [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration. - [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration. - [ ] Moving the parser into sql/core. cc nongli rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Nong Li <nong@databricks.com> Author: Nong Li <nongli@gmail.com> Closes #10509 from hvanhovell/SPARK-12362.
*	[SPARK-12549][SQL] Take Option[Seq[DataType]] in UDF input type specification.	Reynold Xin	2015-12-29	5	-68/+75
\| \| \| \| \| \| \| \|	In Spark we allow UDFs to declare its expected input types in order to apply type coercion. The expected input type parameter takes a Seq[DataType] and uses Nil when no type coercion is applied. It makes more sense to take Option[Seq[DataType]] instead, so we can differentiate a no-arg function vs function with no expected input type specified. Author: Reynold Xin <rxin@databricks.com> Closes #10504 from rxin/SPARK-12549.
*	[SPARK-11199][SPARKR] Improve R context management story and add getOrCreate	Hossein	2015-12-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	* Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. * Adds a simple test [SPARK-11199] #comment link with JIRA Author: Hossein <hossein@databricks.com> Closes #9185 from falaki/SPARK-11199.
*	[SPARK-12530][BUILD] Fix build break at Spark-Master-Maven-Snapshots from #1293	Kazuaki Ishizaki	2015-12-29	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	Compilation error caused due to string concatenations that are not a constant Use raw string literal to avoid string concatenations https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/ Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10488 from kiszk/SPARK-12530.
*	[SPARK-11394][SQL] Throw IllegalArgumentException for unsupported types in ↵	Takeshi YAMAMURO	2015-12-28	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	postgresql If DataFrame has BYTE types, throws an exception: org.postgresql.util.PSQLException: ERROR: type "byte" does not exist Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #9350 from maropu/FixBugInPostgreJdbc.
*	[SPARK-12547][SQL] Tighten scala style checker enforcement for UDF registration	Reynold Xin	2015-12-28	2	-29/+30
\| \| \| \| \| \| \| \| \| \|	We use scalastyle:off to turn off style checks in certain places where it is not possible to follow the style guide. This is usually ok. However, in udf registration, we disable the checker for a large amount of code simply because some of them exceed 100 char line limit. It is better to just disable the line limit check rather than everything. In this pull request, I only disabled line length check, and fixed a problem (lack explicit types for public methods). Author: Reynold Xin <rxin@databricks.com> Closes #10501 from rxin/SPARK-12547.
*	[SPARK-12522][SQL][MINOR] Add the missing document strings for the SQL ↵	gatorsmile	2015-12-28	3	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	configuration Fixing the missing the document for the configuration. We can see the missing messages "TODO" when issuing the command "SET -V". ``` spark.sql.columnNameOfCorruptRecord spark.sql.hive.verifyPartitionPath spark.sql.sources.parallelPartitionDiscovery.threshold spark.sql.hive.convertMetastoreParquet.mergeSchema spark.sql.hive.convertCTAS spark.sql.hive.thriftServer.async ``` Author: gatorsmile <gatorsmile@gmail.com> Closes #10471 from gatorsmile/commandDesc.
*	[SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs	Shixiong Zhu	2015-12-28	3	-24/+46
\| \| \| \| \| \| \| \| \| \| \| \|	Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs.
*	[SPARK-12441][SQL] Fixing missingInput in ↵	gatorsmile	2015-12-28	15	-18/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generate/MapPartitions/AppendColumns/MapGroups/CoGroup When explain any plan with Generate, we will see an exclamation mark in the plan. Normally, when we see this mark, it means the plan has an error. This PR is to correct the `missingInput` in `Generate`. For example, ```scala val df = Seq((1, "a b c"), (2, "a b"), (3, "a")).toDF("number", "letters") val df2 = df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } df2.explain(true) ``` Before the fix, the plan is like ``` == Parsed Logical Plan == 'Generate UserDefinedGenerator('letters), true, false, None +- Project [_1#0 AS number#2,_2#1 AS letters#3] +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]] == Analyzed Logical Plan == number: int, letters: string, _1: string Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8] +- Project [_1#0 AS number#2,_2#1 AS letters#3] +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]] == Optimized Logical Plan == Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8] +- LocalRelation [number#2,letters#3], [[1,a b c],[2,a b],[3,a]] == Physical Plan == !Generate UserDefinedGenerator(letters#3), true, false, [number#2,letters#3,_1#8] +- LocalTableScan [number#2,letters#3], [[1,a b c],[2,a b],[3,a]] ``` Updates: The same issues are also found in the other four Dataset operators: `MapPartitions`/`AppendColumns`/`MapGroups`/`CoGroup`. Fixed all these four. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10393 from gatorsmile/generateExplain.
*	[SPARK-7727][SQL] Avoid inner classes in RuleExecutor	Stephan Kessler	2015-12-28	3	-5/+74
\| \| \| \| \| \| \| \| \| \|	Moved (case) classes Strategy, Once, FixedPoint and Batch to the companion object. This is necessary if we want to have the Optimizer easily extendable in the following sense: Usually a user wants to add additional rules, and just take the ones that are already there. However, inner classes made that impossible since the code did not compile This allows easy extension of existing Optimizers see the DefaultOptimizerExtendableSuite for a corresponding test case. Author: Stephan Kessler <stephan.kessler@sap.com> Closes #10174 from stephankessler/SPARK-7727.
*	[SPARK-12287][SQL] Support UnsafeRow in MapPartitions/MapGroups/CoGroup	gatorsmile	2015-12-28	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Support Unsafe Row in MapPartitions/MapGroups/CoGroup. Added a test case for MapPartitions. Since MapGroups and CoGroup are built on AppendColumns, all the related dataset test cases already can verify the correctness when MapGroups and CoGroup processing unsafe rows. davies cloud-fan Not sure if my understanding is right, please correct me. Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10398 from gatorsmile/unsafeRowMapGroup.