spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-10325] Override hashCode() for public Row	Josh Rosen	2015-08-28	2	-0/+22
\| \| \| \| \| \| \| \| \| \|	This commit fixes an issue where the public SQL `Row` class did not override `hashCode`, causing it to violate the hashCode() + equals() contract. To fix this, I simply ported the `hashCode` implementation from the 1.4.x version of `Row`. Author: Josh Rosen <joshrosen@databricks.com> Closes #8500 from JoshRosen/SPARK-10325 and squashes the following commits: 51ffea1 [Josh Rosen] Override hashCode() for public Row.
*	[SPARK-SQL] [MINOR] Fixes some typos in HiveContext	Cheng Lian	2015-08-27	2	-5/+5
\| \| \| \| \| \|	Author: Cheng Lian <lian@databricks.com> Closes #8481 from liancheng/hive-context-typo.
*	[SPARK-10321] sizeInBytes in HadoopFsRelation	Davies Liu	2015-08-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Having sizeInBytes in HadoopFsRelation to enable broadcast join. cc marmbrus Author: Davies Liu <davies@databricks.com> Closes #8490 from davies/sizeInByte.
*	[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path	Yin Huai	2015-08-27	3	-25/+1
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.
*	[SPARK-10215] [SQL] Fix precision of division (follow the rule in Hive)	Davies Liu	2015-08-25	4	-13/+39
\| \| \| \| \| \| \| \| \| \|	Follow the rule in Hive for decimal division. see https://github.com/apache/hive/blob/ac755ebe26361a4647d53db2a28500f71697b276/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDivide.java#L113 cc chenghao-intel Author: Davies Liu <davies@databricks.com> Closes #8415 from davies/decimal_div2.
*	[SPARK-10245] [SQL] Fix decimal literals with precision < scale	Davies Liu	2015-08-25	3	-6/+19
\| \| \| \| \| \| \| \|	In BigDecimal or java.math.BigDecimal, the precision could be smaller than scale, for example, BigDecimal("0.001") has precision = 1 and scale = 3. But DecimalType require that the precision should be larger than scale, so we should use the maximum of precision and scale when inferring the schema from decimal literal. Author: Davies Liu <davies@databricks.com> Closes #8428 from davies/smaller_decimal.
*	[SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde.	Sun Rui	2015-08-25	1	-28/+4
\| \| \| \| \| \| \| \| \| \| \|	This PR: 1. supports transferring arbitrary nested array from JVM to R side in SerDe; 2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types from a DataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8276 from sun-rui/SPARK-10048.
*	[SPARK-10198] [SQL] Turn off partition verification by default	Michael Armbrust	2015-08-25	2	-31/+35
\| \| \| \| \| \|	Author: Michael Armbrust <michael@databricks.com> Closes #8404 from marmbrus/turnOffPartitionVerification.
*	[SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses ↵	Sean Owen	2015-08-25	46	-265/+282
\| \| \| \| \| \| \| \| \| \| \| \|	to JavaConverters Replace `JavaConversions` implicits with `JavaConverters` Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet. Author: Sean Owen <sowen@cloudera.com> Closes #8033 from srowen/SPARK-9613.
*	[SPARK-10197] [SQL] Add null check in wrapperFor (inside HiveInspectors).	Yin Huai	2015-08-25	2	-5/+53
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10197 Author: Yin Huai <yhuai@databricks.com> Closes #8407 from yhuai/ORCSPARK-10197.
*	[SPARK-10195] [SQL] Data sources Filter should not expose internal types	Josh Rosen	2015-08-25	4	-41/+54
\| \| \| \| \| \| \| \| \| \|	Spark SQL's data sources API exposes Catalyst's internal types through its Filter interfaces. This is a problem because types like UTF8String are not stable developer APIs and should not be exposed to third-parties. This issue caused incompatibilities when upgrading our `spark-redshift` library to work against Spark 1.5.0. To avoid these issues in the future we should only expose public types through these Filter objects. This patch accomplishes this by using CatalystTypeConverters to add the appropriate conversions. Author: Josh Rosen <joshrosen@databricks.com> Closes #8403 from JoshRosen/datasources-internal-vs-external-types.
*	[SPARK-10177] [SQL] fix reading Timestamp in parquet from Hive	Davies Liu	2015-08-25	3	-8/+14
\| \| \| \| \| \| \| \| \| \| \|	We misunderstood the Julian days and nanoseconds of the day in parquet (as TimestampType) from Hive/Impala, they are overlapped, so can't be added together directly. In order to avoid the confusing rounding when do the converting, we use `2440588` as the Julian Day of epoch of unix timestamp (which should be 2440587.5). Author: Davies Liu <davies@databricks.com> Author: Cheng Lian <lian@databricks.com> Closes #8400 from davies/timestamp_parquet.
*	[SPARK-9293] [SPARK-9813] Analysis should check that set operations are only ↵	Josh Rosen	2015-08-25	6	-32/+48
\| \| \| \| \| \| \| \| \| \| \| \|	performed on tables with equal numbers of columns This patch adds an analyzer rule to ensure that set operations (union, intersect, and except) are only applied to tables with the same number of columns. Without this rule, there are scenarios where invalid queries can return incorrect results instead of failing with error messages; SPARK-9813 provides one example of this problem. In other cases, the invalid query can crash at runtime with extremely confusing exceptions. I also performed a bit of cleanup to refactor some of those logical operators' code into a common `SetOperation` base class. Author: Josh Rosen <joshrosen@databricks.com> Closes #7631 from JoshRosen/SPARK-9293.
*	[SPARK-10136] [SQL] A more robust fix for SPARK-10136	Cheng Lian	2015-08-25	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PR #8341 is a valid fix for SPARK-10136, but it didn't catch the real root cause. The real problem can be rather tricky to explain, and requires audiences to be pretty familiar with parquet-format spec, especially details of `LIST` backwards-compatibility rules. Let me have a try to give an explanation here. The structure of the problematic Parquet schema generated by parquet-avro is something like this: ``` message m { <repetition> group f (LIST) { // Level 1 repeated group array (LIST) { // Level 2 repeated <primitive-type> array; // Level 3 } } } ``` (The schema generated by parquet-thrift is structurally similar, just replace the `array` at level 2 with `f_tuple`, and the other one at level 3 with `f_tuple_tuple`.) This structure consists of two nested legacy 2-level `LIST`-like structures: 1. The repeated group type at level 2 is the element type of the outer array defined at level 1 This group should map to an `CatalystArrayConverter.ElementConverter` when building converters. 2. The repeated primitive type at level 3 is the element type of the inner array defined at level 2 This group should also map to an `CatalystArrayConverter.ElementConverter`. The root cause of SPARK-10136 is that, the group at level 2 isn't properly recognized as the element type of level 1. Thus, according to parquet-format spec, the repeated primitive at level 3 is left as a so called "unannotated repeated primitive type", and is recognized as a required list of required primitive type, thus a `RepeatedPrimitiveConverter` instead of a `CatalystArrayConverter.ElementConverter` is created for it. According to parquet-format spec, unannotated repeated type shouldn't appear in a `LIST`- or `MAP`-annotated group. PR #8341 fixed this issue by allowing such unannotated repeated type appear in `LIST`-annotated groups, which is a non-standard, hacky, but valid fix. (I didn't realize this when authoring #8341 though.) As for the reason why level 2 isn't recognized as a list element type, it's because of the following `LIST` backwards-compatibility rule defined in the parquet-format spec: > If the repeated field is a group with one field and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended then the repeated type is the element type and elements are required. (The `array` part is for parquet-avro compatibility, while the `_tuple` part is for parquet-thrift.) This rule is implemented in [`CatalystSchemaConverter.isElementType`] [1], but neglected in [`CatalystRowConverter.isElementType`] [2]. This PR delivers a more robust fix by adding this rule in the latter method. Note that parquet-avro 1.7.0 also suffers from this issue. Details can be found at [PARQUET-364] [3]. [1]: https://github.com/apache/spark/blob/85f9a61357994da5023b08b0a8a2eb09388ce7f8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystSchemaConverter.scala#L259-L305 [2]: https://github.com/apache/spark/blob/85f9a61357994da5023b08b0a8a2eb09388ce7f8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala#L456-L463 [3]: https://issues.apache.org/jira/browse/PARQUET-364 Author: Cheng Lian <lian@databricks.com> Closes #8361 from liancheng/spark-10136/proper-version.
*	[SPARK-10196] [SQL] Correctly saving decimals in internal rows to JSON.	Yin Huai	2015-08-24	2	-1/+28
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10196 Author: Yin Huai <yhuai@databricks.com> Closes #8408 from yhuai/DecimalJsonSPARK-10196.
*	[SPARK-10178] [SQL] HiveComparisionTest should print out dependent tables	Michael Armbrust	2015-08-24	1	-0/+36
\| \| \| \| \| \| \| \|	In `HiveComparisionTest`s it is possible to fail a query of the form `SELECT * FROM dest1`, where `dest1` is the query that is actually computing the incorrect results. To aid debugging this patch improves the harness to also print these query plans and their results. Author: Michael Armbrust <michael@databricks.com> Closes #8388 from marmbrus/generatedTables.
*	[SPARK-10121] [SQL] Thrift server always use the latest class loader ↵	Yin Huai	2015-08-25	2	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \|	provided by the conf of executionHive's state https://issues.apache.org/jira/browse/SPARK-10121 Looks like the problem is that if we add a jar through another thread, the thread handling the JDBC session will not get the latest classloader. Author: Yin Huai <yhuai@databricks.com> Closes #8368 from yhuai/SPARK-10121.
*	[SQL] [MINOR] [DOC] Clarify docs for inferring DataFrame from RDD of Products	Feynman Liang	2015-08-24	2	-2/+2
\| \| \| \| \| \| \| \| \|	* Makes `SQLImplicits.rddToDataFrameHolder` scaladoc consistent with `SQLContext.createDataFrame[A <: Product](rdd: RDD[A])` since the former is essentially a wrapper for the latter * Clarifies `createDataFrame[A <: Product]` scaladoc to apply for any `RDD[Product]`, not just case classes Author: Feynman Liang <fliang@databricks.com> Closes #8406 from feynmanliang/sql-doc-fixes.
*	[SPARK-10165] [SQL] Await child resolution in ResolveFunctions	Michael Armbrust	2015-08-24	2	-44/+77
\| \| \| \| \| \| \| \| \| \|	Currently, we eagerly attempt to resolve functions, even before their children are resolved. However, this is not valid in cases where we need to know the types of the input arguments (i.e. when resolving Hive UDFs). As a fix, this PR delays function resolution until the functions children are resolved. This change also necessitates a change to the way we resolve aggregate expressions that are not in aggregate operators (e.g., in `HAVING` or `ORDER BY` clauses). Specifically, we can't assume that these misplaced functions will be resolved, allowing us to differentiate aggregate functions from normal functions. To compensate for this change we now attempt to resolve these unresolved expressions in the context of the aggregate operator, before checking to see if any aggregate expressions are present. Author: Michael Armbrust <michael@databricks.com> Closes #8371 from marmbrus/hiveUDFResolution.
*	[SPARK-10190] Fix NPE in CatalystTypeConverters Decimal toScala converter	Josh Rosen	2015-08-24	2	-2/+7
\| \| \| \| \| \| \| \|	This adds a missing null check to the Decimal `toScala` converter in `CatalystTypeConverters`, fixing an NPE. Author: Josh Rosen <joshrosen@databricks.com> Closes #8401 from JoshRosen/SPARK-10190.
*	[SPARK-9758] [TEST] [SQL] Compilation issue for hive test / wrong package?	Sean Owen	2015-08-24	10	-9/+6
\| \| \| \| \| \| \| \| \| \| \|	Move `test.org.apache.spark.sql.hive` package tests to apparent intended `org.apache.spark.sql.hive` as they don't intend to test behavior from outside org.apache.spark.* Alternate take, per discussion at https://github.com/apache/spark/pull/8051 I think this is what vanzin and I had in mind but also CC rxin to cross-check, as this does indeed depend on whether these tests were accidentally in this package or not. Testing from a `test.org.apache.spark` package is legitimate but didn't seem to be the intent here. Author: Sean Owen <sowen@cloudera.com> Closes #8307 from srowen/SPARK-9758.
*	[SPARK-8580] [SQL] Refactors ParquetHiveCompatibilitySuite and adds more ↵	Cheng Lian	2015-08-24	1	-39/+93
\| \| \| \| \| \| \| \| \| \| \| \|	test cases This PR refactors `ParquetHiveCompatibilitySuite` so that it's easier to add new test cases. Hit two bugs, SPARK-10177 and HIVE-11625, while working on this, added test cases for them and marked as ignored for now. SPARK-10177 will be addressed in a separate PR. Author: Cheng Lian <lian@databricks.com> Closes #8392 from liancheng/spark-8580/parquet-hive-compat-tests.
*	[SPARK-7710] [SPARK-7998] [DOCS] Docs for DataFrameStatFunctions	Burak Yavuz	2015-08-24	2	-1/+102
\| \| \| \| \| \| \| \| \| \|	This PR contains examples on how to use some of the Stat Functions available for DataFrames under `df.stat`. rxin Author: Burak Yavuz <brkyvz@gmail.com> Closes #8378 from brkyvz/update-sql-docs.
*	[SPARK-9401] [SQL] Fully implement code generation for ConcatWs	Yijie Shen	2015-08-22	1	-3/+39
\| \| \| \| \| \| \| \| \| \| \| \|	This PR adds full codegen support for ConcatWs, is a substitute of #7782 JIRA: https://issues.apache.org/jira/browse/SPARK-9401 cc davies Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #8353 from yjshen/concatws.
*	[SPARK-10143] [SQL] Use parquet's block size (row group size) setting as the ↵	Yin Huai	2015-08-21	1	-2/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	min split size if necessary. https://issues.apache.org/jira/browse/SPARK-10143 With this PR, we will set min split size to parquet's block size (row group size) set in the conf if the min split size is smaller. So, we can avoid have too many tasks and even useless tasks for reading parquet data. I tested it locally. The table I have has 343MB and it is in my local FS. Because I did not set any min/max split size, the default split size was 32MB and the map stage had 11 tasks. But there were only three tasks that actually read data. With my PR, there were only three tasks in the map stage. Here is the difference. Without this PR: ![image](https://cloud.githubusercontent.com/assets/2072857/9399179/8587dba6-4765-11e5-9189-7ebba52a2b6d.png) With this PR: ![image](https://cloud.githubusercontent.com/assets/2072857/9399185/a4735d74-4765-11e5-8848-1f1e361a6b4b.png) Even if the block size setting does match the actual block size of parquet file, I think it is still generally good to use parquet's block size setting if min split size is smaller than this block size. Tested it on a cluster using ``` val count = sqlContext.table("""store_sales""").groupBy().count().queryExecution.executedPlan(3).execute().count ``` Basically, it reads 0 column of table `store_sales`. My table has 1824 parquet files with size from 80MB to 280MB (1 to 3 row group sizes). Without this patch, in a 16 worker cluster, the job had 5023 tasks and spent 102s. With this patch, the job had 2893 tasks and spent 64s. It is still not as good as using one mapper per file (1824 tasks and 42s), but it is much better than our master. Author: Yin Huai <yhuai@databricks.com> Closes #8346 from yhuai/parquetMinSplit.
*	[SPARK-10130] [SQL] type coercion for IF should have children resolved first	Daoyuan Wang	2015-08-21	2	-0/+8
\| \| \| \| \| \| \| \|	Type coercion for IF should have children resolved first, or we could meet unresolved exception. Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #8331 from adrian-wang/spark10130.
*	[SPARK-10040] [SQL] Use batch insert for JDBC writing	Liang-Chi Hsieh	2015-08-21	1	-3/+14
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-10040 We should use batch insert instead of single row in JDBC. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8273 from viirya/jdbc-insert-batch.
*	[SPARK-9400] [SQL] codegen for StringLocate	Tarek Auel	2015-08-20	1	-1/+27
\| \| \| \| \| \| \| \| \| \| \|	This is based on #7779 , thanks to tarekauel . Fix the conflict and nullability. Closes #7779 and #8274 . Author: Tarek Auel <tarek.auel@googlemail.com> Author: Davies Liu <davies@databricks.com> Closes #8330 from davies/stringLocate.
*	[SQL] [MINOR] remove unnecessary class	Wenchen Fan	2015-08-20	1	-64/+0
\| \| \| \| \| \| \| \|	This class is identical to `org.apache.spark.sql.execution.datasources.jdbc. DefaultSource` and is not needed. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8334 from cloud-fan/minor.
*	[SPARK-10136] [SQL] Fixes Parquet support for Avro array of primitive array	Cheng Lian	2015-08-20	13	-844/+1718
\| \| \| \| \| \| \| \|	I caught SPARK-10136 while adding more test cases to `ParquetAvroCompatibilitySuite`. Actual bug fix code lies in `CatalystRowConverter.scala`. Author: Cheng Lian <lian@databricks.com> Closes #8341 from liancheng/spark-10136/parquet-avro-nested-primitive-array.
*	[SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key ↵	Reynold Xin	2015-08-20	2	-10/+22
\| \| \| \| \| \| \| \| \| \|	in aggregation. This improves performance by ~ 20 - 30% in one of my local test and should fix the performance regression from 1.4 to 1.5 on ss_max. Author: Reynold Xin <rxin@databricks.com> Closes #8332 from rxin/SPARK-10100.
*	[SPARK-10092] [SQL] Multi-DB support follow up.	Yin Huai	2015-08-20	16	-94/+398
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10092 This pr is a follow-up one for Multi-DB support. It has the following changes: * `HiveContext.refreshTable` now accepts `dbName.tableName`. * `HiveContext.analyze` now accepts `dbName.tableName`. * `CreateTableUsing`, `CreateTableUsingAsSelect`, `CreateTempTableUsing`, `CreateTempTableUsingAsSelect`, `CreateMetastoreDataSource`, and `CreateMetastoreDataSourceAsSelect` all take `TableIdentifier` instead of the string representation of table name. * When you call `saveAsTable` with a specified database, the data will be saved to the correct location. * Explicitly do not allow users to create a temporary with a specified database name (users cannot do it before). * When we save table to metastore, we also check if db name and table name can be accepted by hive (using `MetaStoreUtils.validateName`). Author: Yin Huai <yhuai@databricks.com> Closes #8324 from yhuai/saveAsTableDB.
*	[SPARK-9242] [SQL] Audit UDAF interface.	Reynold Xin	2015-08-19	18	-349/+386
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A few minor changes: 1. Improved documentation 2. Rename apply(distinct....) to distinct. 3. Changed MutableAggregationBuffer from a trait to an abstract class. 4. Renamed returnDataType to dataType to be more consistent with other expressions. And unrelated to UDAFs: 1. Renamed file names in expressions to use suffix "Expressions" to be more consistent. 2. Moved regexp related expressions out to its own file. 3. Renamed StringComparison => StringPredicate. Author: Reynold Xin <rxin@databricks.com> Closes #8321 from rxin/SPARK-9242.
*	[SPARK-10035] [SQL] Parquet filters does not process EqualNullSafe filter.	hyukjinkwon	2015-08-20	2	-139/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As I talked with Lian, 1. I added EquelNullSafe to ParquetFilters - It uses the same equality comparison filter with EqualTo since the Parquet filter performs actually null-safe equality comparison. 2. Updated the test code (ParquetFilterSuite) - Convert catalyst.Expression to sources.Filter - Removed Cast since only Literal is picked up as a proper Filter in DataSourceStrategy - Added EquelNullSafe comparison 3. Removed deprecated createFilter for catalyst.Expression Author: hyukjinkwon <gurwls223@gmail.com> Author: 권혁진 <gurwls223@gmail.com> Closes #8275 from HyukjinKwon/master.
*	[SPARK-6489] [SQL] add column pruning for Generate	Wenchen Fan	2015-08-19	3	-2/+100
\| \| \| \| \| \| \| \|	This PR takes over https://github.com/apache/spark/pull/5358 Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8268 from cloud-fan/6489.
*	[SPARK-10083] [SQL] CaseWhen should support type coercion of DecimalType and ↵	Daoyuan Wang	2015-08-19	2	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \|	FractionalType create t1 (a decimal(7, 2), b long); select case when 1=1 then a else 1.0 end from t1; select case when 1=1 then a else b end from t1; Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #8270 from adrian-wang/casewhenfractional.
*	[SPARK-9899] [SQL] Disables customized output committer when speculation is on	Cheng Lian	2015-08-19	2	-1/+49
\| \| \| \| \| \| \| \| \| \| \| \|	Speculation hates direct output committer, as there are multiple corner cases that may cause data corruption and/or data loss. Please see this [PR comment] [1] for more details. [1]: https://github.com/apache/spark/pull/8191#issuecomment-131598385 Author: Cheng Lian <lian@databricks.com> Closes #8317 from liancheng/spark-9899/speculation-hates-direct-output-committer.
*	[SPARK-10090] [SQL] fix decimal scale of division	Davies Liu	2015-08-19	6	-31/+157
\| \| \| \| \| \| \| \|	We should rounding the result of multiply/division of decimal to expected precision/scale, also check overflow. Author: Davies Liu <davies@databricks.com> Closes #8287 from davies/decimal_division.
*	[SPARK-9627] [SQL] Stops using Scala runtime reflection in DictionaryEncoding	Cheng Lian	2015-08-19	2	-12/+4
\| \| \| \| \| \| \| \| \| \|	`DictionaryEncoding` uses Scala runtime reflection to avoid boxing costs while building the directory array. However, this code path may hit [SI-6240] [1] and throw exception. [1]: https://issues.scala-lang.org/browse/SI-6240 Author: Cheng Lian <lian@databricks.com> Closes #8306 from liancheng/spark-9627/in-memory-cache-scala-reflection.
*	[SPARK-10073] [SQL] Python withColumn should replace the old column	Davies Liu	2015-08-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	DataFrame.withColumn in Python should be consistent with the Scala one (replacing the existing column that has the same name). cc marmbrus Author: Davies Liu <davies@databricks.com> Closes #8300 from davies/with_column.
*	[SPARK-10107] [SQL] fix NPE in format_number	Davies Liu	2015-08-19	2	-3/+3
\| \| \| \| \| \|	Author: Davies Liu <davies@databricks.com> Closes #8305 from davies/format_number.
*	[SPARK-10093] [SPARK-10096] [SQL] Avoid transformation on executors & fix ↵	Reynold Xin	2015-08-18	4	-7/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UDFs on complex types This is kind of a weird case, but given a sufficiently complex query plan (in this case a TungstenProject with an Exchange underneath), we could have NPEs on the executors due to the time when we were calling transformAllExpressions In general we should ensure that all transformations occur on the driver and not on the executors. Some reasons for avoid executor side transformations include: * (this case) Some operator constructors require state such as access to the Spark/SQL conf so doing a makeCopy on the executor can fail. * (unrelated reason for avoid executor transformations) ExprIds are calculated using an atomic integer, so you can violate their uniqueness constraint by constructing them anywhere other than the driver. This subsumes #8285. Author: Reynold Xin <rxin@databricks.com> Author: Michael Armbrust <michael@databricks.com> Closes #8295 from rxin/SPARK-10096.
*	[SPARK-10095] [SQL] use public API of BigInteger	Davies Liu	2015-08-18	2	-27/+11
\| \| \| \| \| \| \| \| \| \| \| \|	In UnsafeRow, we use the private field of BigInteger for better performance, but it actually didn't contribute much (3% in one benchmark) to end-to-end runtime, and make it not portable (may fail on other JVM implementations). So we should use the public API instead. cc rxin Author: Davies Liu <davies@databricks.com> Closes #8286 from davies/portable_decimal.
*	[SPARK-9939] [SQL] Resorts to Java process API in CliSuite, ↵	Cheng Lian	2015-08-19	5	-91/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	HiveSparkSubmitSuite and HiveThriftServer2 test suites Scala process API has a known bug ([SI-8768] [1]), which may be the reason why several test suites which fork sub-processes are flaky. This PR replaces Scala process API with Java process API in `CliSuite`, `HiveSparkSubmitSuite`, and `HiveThriftServer2` related test suites to see whether it fix these flaky tests. [1]: https://issues.scala-lang.org/browse/SI-8768 Author: Cheng Lian <lian@databricks.com> Closes #8168 from liancheng/spark-9939/use-java-process-api.
*	[SPARK-10088] [SQL] Add support for "stored as avro" in HiveQL parser.	Marcelo Vanzin	2015-08-18	2	-10/+13
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8282 from vanzin/SPARK-10088.
*	[SPARK-10089] [SQL] Add missing golden files.	Marcelo Vanzin	2015-08-18	2	-0/+503
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8283 from vanzin/SPARK-10089.
*	[SPARK-10080] [SQL] Fix binary incompatibility for $ column interpolation	Michael Armbrust	2015-08-18	3	-11/+22
\| \| \| \| \| \| \| \|	Turns out that inner classes of inner objects are referenced directly, and thus moving it will break binary compatibility. Author: Michael Armbrust <michael@databricks.com> Closes #8281 from marmbrus/binaryCompat.
*	[SPARK-8118] [SQL] Redirects Parquet JUL logger via SLF4J	Cheng Lian	2015-08-18	4	-43/+45
\| \| \| \| \| \| \| \| \| \|	Parquet hard coded a JUL logger which always writes to stdout. This PR redirects it via SLF4j JUL bridge handler, so that we can control Parquet logs via `log4j.properties`. This solution is inspired by https://github.com/Parquet/parquet-mr/issues/390#issuecomment-46064909. Author: Cheng Lian <lian@databricks.com> Closes #8196 from liancheng/spark-8118/redirect-parquet-jul.
*	[SPARK-10038] [SQL] fix bug in generated unsafe projection when there is ↵	Davies Liu	2015-08-17	2	-4/+29
\| \| \| \| \| \| \| \| \| \| \| \|	binary in ArrayData The type for array of array in Java is slightly different than array of others. cc cloud-fan Author: Davies Liu <davies@databricks.com> Closes #8250 from davies/array_binary.
*	[MINOR] Format the comment of `translate` at `functions.scala`	Yu ISHIKAWA	2015-08-17	1	-8/+9
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8265 from yu-iskw/minor-translate-comment.