spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-10093] [SPARK-10096] [SQL] Avoid transformation on executors & fix ↵	Reynold Xin	2015-08-18	4	-7/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UDFs on complex types This is kind of a weird case, but given a sufficiently complex query plan (in this case a TungstenProject with an Exchange underneath), we could have NPEs on the executors due to the time when we were calling transformAllExpressions In general we should ensure that all transformations occur on the driver and not on the executors. Some reasons for avoid executor side transformations include: * (this case) Some operator constructors require state such as access to the Spark/SQL conf so doing a makeCopy on the executor can fail. * (unrelated reason for avoid executor transformations) ExprIds are calculated using an atomic integer, so you can violate their uniqueness constraint by constructing them anywhere other than the driver. This subsumes #8285. Author: Reynold Xin <rxin@databricks.com> Author: Michael Armbrust <michael@databricks.com> Closes #8295 from rxin/SPARK-10096.
*	[SPARK-10095] [SQL] use public API of BigInteger	Davies Liu	2015-08-18	2	-27/+11
\| \| \| \| \| \| \| \| \| \| \| \|	In UnsafeRow, we use the private field of BigInteger for better performance, but it actually didn't contribute much (3% in one benchmark) to end-to-end runtime, and make it not portable (may fail on other JVM implementations). So we should use the public API instead. cc rxin Author: Davies Liu <davies@databricks.com> Closes #8286 from davies/portable_decimal.
*	[SPARK-9939] [SQL] Resorts to Java process API in CliSuite, ↵	Cheng Lian	2015-08-19	5	-91/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	HiveSparkSubmitSuite and HiveThriftServer2 test suites Scala process API has a known bug ([SI-8768] [1]), which may be the reason why several test suites which fork sub-processes are flaky. This PR replaces Scala process API with Java process API in `CliSuite`, `HiveSparkSubmitSuite`, and `HiveThriftServer2` related test suites to see whether it fix these flaky tests. [1]: https://issues.scala-lang.org/browse/SI-8768 Author: Cheng Lian <lian@databricks.com> Closes #8168 from liancheng/spark-9939/use-java-process-api.
*	[SPARK-10088] [SQL] Add support for "stored as avro" in HiveQL parser.	Marcelo Vanzin	2015-08-18	2	-10/+13
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8282 from vanzin/SPARK-10088.
*	[SPARK-10089] [SQL] Add missing golden files.	Marcelo Vanzin	2015-08-18	2	-0/+503
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8283 from vanzin/SPARK-10089.
*	[SPARK-10080] [SQL] Fix binary incompatibility for $ column interpolation	Michael Armbrust	2015-08-18	3	-11/+22
\| \| \| \| \| \| \| \|	Turns out that inner classes of inner objects are referenced directly, and thus moving it will break binary compatibility. Author: Michael Armbrust <michael@databricks.com> Closes #8281 from marmbrus/binaryCompat.
*	[SPARK-8118] [SQL] Redirects Parquet JUL logger via SLF4J	Cheng Lian	2015-08-18	4	-43/+45
\| \| \| \| \| \| \| \| \| \|	Parquet hard coded a JUL logger which always writes to stdout. This PR redirects it via SLF4j JUL bridge handler, so that we can control Parquet logs via `log4j.properties`. This solution is inspired by https://github.com/Parquet/parquet-mr/issues/390#issuecomment-46064909. Author: Cheng Lian <lian@databricks.com> Closes #8196 from liancheng/spark-8118/redirect-parquet-jul.
*	[SPARK-10038] [SQL] fix bug in generated unsafe projection when there is ↵	Davies Liu	2015-08-17	2	-4/+29
\| \| \| \| \| \| \| \| \| \| \| \|	binary in ArrayData The type for array of array in Java is slightly different than array of others. cc cloud-fan Author: Davies Liu <davies@databricks.com> Closes #8250 from davies/array_binary.
*	[MINOR] Format the comment of `translate` at `functions.scala`	Yu ISHIKAWA	2015-08-17	1	-8/+9
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8265 from yu-iskw/minor-translate-comment.
*	[SPARK-9592] [SQL] Fix Last function implemented based on AggregateExpression1.	Yin Huai	2015-08-17	2	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-9592 #8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113. Author: Yin Huai <yhuai@databricks.com> Closes #8172 from yhuai/lastFix and squashes the following commits: b28c42a [Yin Huai] Regression test. af87086 [Yin Huai] Fix last.
*	[SPARK-9526] [SQL] Utilize randomized tests to reveal potential bugs in sql ↵	Yijie Shen	2015-08-17	10	-6/+410
\| \| \| \| \| \| \| \| \| \| \| \|	expressions JIRA: https://issues.apache.org/jira/browse/SPARK-9526 This PR is a follow up of #7830, aiming at utilizing randomized tests to reveal more potential bugs in sql expression. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7855 from yjshen/property_check.
*	[SPARK-10036] [SQL] Load JDBC driver in DataFrameReader.jdbc and ↵	zsxwing	2015-08-17	4	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	DataFrameWriter.jdbc This PR uses `JDBCRDD.getConnector` to load JDBC driver before creating connection in `DataFrameReader.jdbc` and `DataFrameWriter.jdbc`. Author: zsxwing <zsxwing@gmail.com> Closes #8232 from zsxwing/SPARK-10036 and squashes the following commits: adf75de [zsxwing] Add extraOptions to the connection properties 57f59d4 [zsxwing] Load JDBC driver in DataFrameReader.jdbc and DataFrameWriter.jdbc
*	[SPARK-9950] [SQL] Wrong Analysis Error for grouping/aggregating on struct ↵	Wenchen Fan	2015-08-17	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	fields This issue has been fixed by https://github.com/apache/spark/pull/8215, this PR added regression test for it. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8222 from cloud-fan/minor and squashes the following commits: 0bbfb1c [Wenchen Fan] fix style... 7e2d8d9 [Wenchen Fan] add test
*	[SPARK-7837] [SQL] Avoids double closing output writers when commitTask() fails	Cheng Lian	2015-08-18	2	-6/+61
\| \| \| \| \| \| \| \|	When inserting data into a `HadoopFsRelation`, if `commitTask()` of the writer container fails, `abortTask()` will be invoked. However, both `commitTask()` and `abortTask()` try to close the output writer(s). The problem is that, closing underlying writers may not be an idempotent operation. E.g., `ParquetRecordWriter.close()` throws NPE when called twice. Author: Cheng Lian <lian@databricks.com> Closes #8236 from liancheng/spark-7837/double-closing.
*	[SPARK-10005] [SQL] Fixes schema merging for nested structs	Cheng Lian	2015-08-16	4	-22/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handled. For example, the schema of a Parquet file to be read can be: ``` message individual { required group f1 { optional binary f11 (utf8); } } ``` while the global schema is: ``` message global { required group f1 { optional binary f11 (utf8); optional int32 f12; } } ``` This PR fixes this issue by padding missing fields when creating actual converters. Author: Cheng Lian <lian@databricks.com> Closes #8228 from liancheng/spark-10005/nested-schema-merging.
*	[SPARK-9973] [SQL] Correct in-memory columnar buffer size	Kun Xu	2015-08-16	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	The `initialSize` argument of `ColumnBuilder.initialize()` should be the number of rows rather than bytes. However `InMemoryColumnarTableScan` passes in a byte size, which makes Spark SQL allocate more memory than necessary when building in-memory columnar buffers. Author: Kun Xu <viper_kun@163.com> Closes #8189 from viper-kun/errorSize.
*	[SPARK-9955] [SQL] correct error message for aggregate	Wenchen Fan	2015-08-15	3	-7/+12
\| \| \| \| \| \| \| \| \| \| \|	We should skip unresolved `LogicalPlan`s for `PullOutNondeterministic`, as calling `output` on unresolved `LogicalPlan` will produce confusing error message. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8203 from cloud-fan/error-msg and squashes the following commits: 1c67ca7 [Wenchen Fan] move test 7593080 [Wenchen Fan] correct error message for aggregate
*	[SPARK-9984] [SQL] Create local physical operator interface.	Reynold Xin	2015-08-14	4	-0/+224
\| \| \| \| \| \| \| \| \| \| \| \|	This pull request creates a new operator interface that is more similar to traditional database query iterators (with open/close/next/get). These local operators are not currently used anywhere, but will become the basis for SPARK-9983 (local physical operators for query execution). cc zsxwing Author: Reynold Xin <rxin@databricks.com> Closes #8212 from rxin/SPARK-9984.
*	[SPARK-8887] [SQL] Explicit define which data types can be used as dynamic ↵	Yijie Shen	2015-08-14	5	-4/+41
\| \| \| \| \| \| \| \| \| \| \| \|	partition columns This PR enforce dynamic partition column data type requirements by adding analysis rules. JIRA: https://issues.apache.org/jira/browse/SPARK-8887 Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #8201 from yjshen/dynamic_partition_columns.
*	[SPARK-9634] [SPARK-9323] [SQL] cleanup unnecessary Aliases in LogicalPlan ↵	Wenchen Fan	2015-08-14	9	-24/+120
\| \| \| \| \| \| \| \| \| \| \| \| \|	at the end of analysis Also alias the ExtractValue instead of wrapping it with UnresolvedAlias when resolve attribute in LogicalPlan, as this alias will be trimmed if it's unnecessary. Based on #7957 without the changes to mllib, but instead maintaining earlier behavior when using `withColumn` on expressions that already have metadata. Author: Wenchen Fan <cloud0fan@outlook.com> Author: Michael Armbrust <michael@databricks.com> Closes #8215 from marmbrus/pr/7957.
*	[HOTFIX] fix duplicated braces	Davies Liu	2015-08-14	3	-3/+3
\| \| \| \| \| \|	Author: Davies Liu <davies@databricks.com> Closes #8219 from davies/fix_typo.
*	[SPARK-9949] [SQL] Fix TakeOrderedAndProject's output.	Yin Huai	2015-08-14	2	-4/+28
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-9949 Author: Yin Huai <yhuai@databricks.com> Closes #8179 from yhuai/SPARK-9949.
*	[SPARK-8670] [SQL] Nested columns can't be referenced in pyspark	Wenchen Fan	2015-08-14	1	-0/+2
\| \| \| \| \| \| \| \|	This bug is caused by a wrong column-exist-check in `__getitem__` of pyspark dataframe. `DataFrame.apply` accepts not only top level column names, but also nested column name like `a.b`, so we should remove that check from `__getitem__`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8202 from cloud-fan/nested.
*	[SPARK-9561] Re-enable BroadcastJoinSuite	Andrew Or	2015-08-14	1	-76/+69
\| \| \| \| \| \| \| \|	We can do this now that SPARK-9580 is resolved. Author: Andrew Or <andrew@databricks.com> Closes #8208 from andrewor14/reenable-sql-tests.
*	[SPARK-9929] [SQL] support metadata in withColumn	Wenchen Fan	2015-08-14	1	-0/+17
\| \| \| \| \| \| \| \|	in MLlib sometimes we need to set metadata for the new column, thus we will alias the new column with metadata before call `withColumn` and in `withColumn` we alias this clolumn again. Here I overloaded `withColumn` to allow user set metadata, just like what we did for `Column.as`. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8159 from cloud-fan/withColumn.
*	[SPARK-9958] [SQL] Make HiveThriftServer2Listener thread-safe and update the ↵	zsxwing	2015-08-14	4	-60/+78
\| \| \| \| \| \| \| \| \| \| \| \|	tab name to "JDBC/ODBC Server" This PR fixed the thread-safe issue of HiveThriftServer2Listener, and also changed the tab name to "JDBC/ODBC Server" since it's conflict with the new SQL tab. <img width="1377" alt="thriftserver" src="https://cloud.githubusercontent.com/assets/1000778/9265707/c46f3f2c-4269-11e5-8d7e-888c9113ab4f.png"> Author: zsxwing <zsxwing@gmail.com> Closes #8185 from zsxwing/SPARK-9958.
*	[MINOR] [SQL] Remove canEqual in Row	Liang-Chi Hsieh	2015-08-13	1	-21/+0
\| \| \| \| \| \| \| \|	As `InternalRow` does not extend `Row` now, I think we can remove it. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8170 from viirya/remove_canequal.
*	[SPARK-9945] [SQL] pageSize should be calculated from executor.memory	Davies Liu	2015-08-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead. Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16. cc rxin Author: Davies Liu <davies@databricks.com> Closes #8175 from davies/page_size.
*	[SPARK-9580] [SQL] Replace singletons in SQL tests	Andrew Or	2015-08-13	94	-1198/+1439
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A fundamental limitation of the existing SQL tests is that there is simply no way to create your own `SparkContext`. This is a serious limitation because the user may wish to use a different master or config. As a case in point, `BroadcastJoinSuite` is entirely commented out because there is no way to make it pass with the existing infrastructure. This patch removes the singletons `TestSQLContext` and `TestData`, and instead introduces a `SharedSQLContext` that starts a context per suite. Unfortunately the singletons were so ingrained in the SQL tests that this patch necessarily needed to touch all the SQL test files. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/8111) <!-- Reviewable:end --> Author: Andrew Or <andrew@databricks.com> Closes #8111 from andrewor14/sql-tests-refactor.
*	[SPARK-9943] [SQL] deserialized UnsafeHashedRelation should be serializable	Davies Liu	2015-08-13	2	-33/+74
\| \| \| \| \| \| \| \| \| \|	When the free memory in executor goes low, the cached broadcast objects need to serialized into disk, but currently the deserialized UnsafeHashedRelation can't be serialized , fail with NPE. This PR fixes that. cc rxin Author: Davies Liu <davies@databricks.com> Closes #8174 from davies/serialize_hashed.
*	[SPARK-9935] [SQL] EqualNotNull not processed in ORC	hyukjinkwon	2015-08-13	1	-0/+5
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-9935 Author: hyukjinkwon <gurwls223@gmail.com> Closes #8163 from HyukjinKwon/master.
*	[SPARK-9757] [SQL] Fixes persistence of Parquet relation with decimal column	Cheng Lian	2015-08-13	11	-30/+150
\| \| \| \| \| \| \| \| \|	PR #7967 enables us to save data source relations to metastore in Hive compatible format when possible. But it fails to persist Parquet relations with decimal column(s) to Hive metastore of versions lower than 1.2.0. This is because `ParquetHiveSerDe` in Hive versions prior to 1.2.0 doesn't support decimal. This PR checks for this case and falls back to Spark SQL specific metastore table format. Author: Yin Huai <yhuai@databricks.com> Author: Cheng Lian <lian@databricks.com> Closes #8130 from liancheng/spark-9757/old-hive-parquet-decimal.
*	[SPARK-9885] [SQL] Also pass barrierPrefixes and sharedPrefixes to ↵	Yin Huai	2015-08-13	2	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \|	IsolatedClientLoader when hiveMetastoreJars is set to maven. https://issues.apache.org/jira/browse/SPARK-9885 cc marmbrus liancheng Author: Yin Huai <yhuai@databricks.com> Closes #8158 from yhuai/classloaderMaven.
*	[SPARK-9927] [SQL] Revert 8049 since it's pushing wrong filter down	Yijie Shen	2015-08-13	3	-64/+3
\| \| \| \| \| \| \| \| \| \|	I made a mistake in #8049 by casting literal value to attribute's data type, which would cause simply truncate the literal value and push a wrong filter down. JIRA: https://issues.apache.org/jira/browse/SPARK-9927 Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #8157 from yjshen/rever8049.
*	[SPARK-9832] [SQL] add a thread-safe lookup for BytesToBytseMap	Davies Liu	2015-08-12	1	-2/+4
\| \| \| \| \| \| \| \|	This patch add a thread-safe lookup for BytesToBytseMap, and use that in broadcasted HashedRelation. Author: Davies Liu <davies@databricks.com> Closes #8151 from davies/safeLookup.
*	[SPARK-9920] [SQL] The simpleString of TungstenAggregate does not show its ↵	Yin Huai	2015-08-12	2	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	output https://issues.apache.org/jira/browse/SPARK-9920 Taking `sqlContext.sql("select i, sum(j1) as sum from testAgg group by i").explain()` as an example, the output of our current master is ``` == Physical Plan == TungstenAggregate(key=[i#0], value=[(sum(cast(j1#1 as bigint)),mode=Final,isDistinct=false)] TungstenExchange hashpartitioning(i#0) TungstenAggregate(key=[i#0], value=[(sum(cast(j1#1 as bigint)),mode=Partial,isDistinct=false)] Scan ParquetRelation[file:/user/hive/warehouse/testagg][i#0,j1#1] ``` With this PR, the output will be ``` == Physical Plan == TungstenAggregate(key=[i#0], functions=[(sum(cast(j1#1 as bigint)),mode=Final,isDistinct=false)], output=[i#0,sum#18L]) TungstenExchange hashpartitioning(i#0) TungstenAggregate(key=[i#0], functions=[(sum(cast(j1#1 as bigint)),mode=Partial,isDistinct=false)], output=[i#0,currentSum#22L]) Scan ParquetRelation[file:/user/hive/warehouse/testagg][i#0,j1#1] ``` Author: Yin Huai <yhuai@databricks.com> Closes #8150 from yhuai/SPARK-9920.
*	[SPARK-9908] [SQL] When spark.sql.tungsten.enabled is false, broadcast join ↵	Yin Huai	2015-08-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	does not work https://issues.apache.org/jira/browse/SPARK-9908 Author: Yin Huai <yhuai@databricks.com> Closes #8149 from yhuai/SPARK-9908.
*	[SPARK-9827] [SQL] fix fd leak in UnsafeRowSerializer	Davies Liu	2015-08-12	2	-3/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Currently, UnsafeRowSerializer does not close the InputStream, will cause fd leak if the InputStream has an open fd in it. TODO: the fd could still be leaked, if any items in the stream is not consumed. Currently it replies on GC to close the fd in this case. cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #8116 from davies/fd_leak.
*	[SPARK-9870] Disable driver UI and Master REST server in SparkSubmitSuite	Josh Rosen	2015-08-12	1	-1/+9
\| \| \| \| \| \| \| \|	I think that we should pass additional configuration flags to disable the driver UI and Master REST server in SparkSubmitSuite and HiveSparkSubmitSuite. This might cut down on port-contention-related flakiness in Jenkins. Author: Josh Rosen <joshrosen@databricks.com> Closes #8124 from JoshRosen/disable-ui-in-sparksubmitsuite.
*	[SPARK-9449] [SQL] Include MetastoreRelation's inputFiles	Michael Armbrust	2015-08-12	5	-20/+66
\| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #8119 from marmbrus/metastoreInputFiles.
*	[SPARK-9894] [SQL] Json writer should handle MapData.	Yin Huai	2015-08-12	3	-35/+83
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-9894 Author: Yin Huai <yhuai@databricks.com> Closes #8137 from yhuai/jsonMapData.
*	[SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties	Michel Lemay	2015-08-12	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor Utils class and create ShutdownHookManager. NOTE: Wasn't able to run /dev/run-tests on windows machine. Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark) ex: log4j.rootCategory=WARN,console,redis log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.spark.graphx.Pregel=INFO log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender log4j.appender.redis.endpoints=hostname:port log4j.appender.redis.key=mykey log4j.appender.redis.alwaysBatch=false log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1 Author: michellemay <mlemay@gmail.com> Closes #8109 from michellemay/SPARK-9826.
*	[SPARK-9804] [HIVE] Use correct value for isSrcLocal parameter.	Marcelo Vanzin	2015-08-12	1	-3/+10
\| \| \| \| \| \| \| \| \| \|	If the correct parameter is not provided, Hive will run into an error because it calls methods that are specific to the local filesystem to copy the data. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8086 from vanzin/SPARK-9804.
*	[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation	Andrew Or	2015-08-12	4	-61/+162
\| \| \| \| \| \| \| \| \| \| \| \|	This is the sister patch to #8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <andrew@databricks.com> Closes #8038 from andrewor14/unsafe-starve-memory-agg.
*	[SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to ↵	Cheng Lian	2015-08-12	16	-136/+374
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	be pushed down This PR adds a hacky workaround for PARQUET-201, and should be removed once we upgrade to parquet-mr 1.8.1 or higher versions. In Parquet, not all types of columns can be used for filter push-down optimization. The set of valid column types is controlled by `ValidTypeMap`. Unfortunately, in parquet-mr 1.7.0 and prior versions, this limitation is too strict, and doesn't allow `BINARY (ENUM)` columns to be pushed down. On the other hand, `BINARY (ENUM)` is commonly seen in Parquet files written by libraries like `parquet-avro`. This restriction is problematic for Spark SQL, because Spark SQL doesn't have a type that maps to Parquet `BINARY (ENUM)` directly, and always converts `BINARY (ENUM)` to Catalyst `StringType`. Thus, a predicate involving a `BINARY (ENUM)` is recognized as one involving a string field instead and can be pushed down by the query optimizer. Such predicates are actually perfectly legal except that it fails the `ValidTypeMap` check. The workaround added here is relaxing `ValidTypeMap` to include `BINARY (ENUM)`. I also took the chance to simplify `ParquetCompatibilityTest` a little bit when adding regression test. Author: Cheng Lian <lian@databricks.com> Closes #8107 from liancheng/spark-9407/parquet-enum-filter-push-down.
*	[SPARK-9182] [SQL] Filters are not passed through to jdbc source	Yijie Shen	2015-08-12	3	-3/+63
\| \| \| \| \| \| \| \| \| \|	This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching. While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down. Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #8049 from yjshen/jdbc_pushdown.
*	[SPARK-9854] [SQL] RuleExecutor.timeMap should be thread-safe	Josh Rosen	2015-08-11	1	-6/+9
\| \| \| \| \| \| \| \| \| \|	`RuleExecutor.timeMap` is currently a non-thread-safe mutable HashMap; this can lead to infinite loops if multiple threads are concurrently modifying the map. I believe that this is responsible for some hangs that I've observed in HiveQuerySuite. This patch addresses this by using a Guava `AtomicLongMap`. Author: Josh Rosen <joshrosen@databricks.com> Closes #8120 from JoshRosen/rule-executor-time-map-fix.
*	[SPARK-9831] [SQL] fix serialization with empty broadcast	Davies Liu	2015-08-11	2	-1/+18
\| \| \| \| \| \| \| \|	Author: Davies Liu <davies@databricks.com> Closes #8117 from davies/fix_serialization and squashes the following commits: d21ac71 [Davies Liu] fix serialization with empty broadcast
*	[SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be ↵	Reynold Xin	2015-08-11	2	-1/+33
\| \| \| \| \| \| \| \| \| \|	backward compatible DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so we must be able to resolve the old committer qualified name. Author: Reynold Xin <rxin@databricks.com> Closes #8114 from rxin/SPARK-9849.
*	[SPARK-9814] [SQL] EqualNotNull not passing to data sources	hyukjinkwon	2015-08-11	3	-0/+15
\| \| \| \| \| \| \|	Author: hyukjinkwon <gurwls223@gmail.com> Author: 권혁진 <gurwls223@gmail.com> Closes #8096 from HyukjinKwon/master.