diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-08-12 14:40:12 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-08-12 14:40:12 +0800 |
commit | abff92bfdc7d4c9d2308794f0350561fe0ceb4dd (patch) | |
tree | 6ec663f2fe5a1c7315e3cc8e291d9b8e1c83da35 /sql/core/src/test/scala | |
parent | ccc6dc0f4b62837c73fca0e3c8b9c14be798b062 (diff) | |
download | spark-abff92bfdc7d4c9d2308794f0350561fe0ceb4dd.tar.gz spark-abff92bfdc7d4c9d2308794f0350561fe0ceb4dd.tar.bz2 spark-abff92bfdc7d4c9d2308794f0350561fe0ceb4dd.zip |
[SPARK-16975][SQL] Column-partition path starting '_' should be handled correctly
## What changes were proposed in this pull request?
Currently, Spark ignores path names starting with underscore `_` and `.`. This causes read-failures for the column-partitioned file data sources whose partition column names starts from '_', e.g. `_col`.
**Before**
```scala
scala> spark.range(10).withColumn("_locality_code", $"id").write.partitionBy("_locality_code").save("/tmp/parquet")
scala> spark.read.parquet("/tmp/parquet")
org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /tmp/parquet20. It must be specified manually;
```
**After**
```scala
scala> spark.range(10).withColumn("_locality_code", $"id").write.partitionBy("_locality_code").save("/tmp/parquet")
scala> spark.read.parquet("/tmp/parquet")
res2: org.apache.spark.sql.DataFrame = [id: bigint, _locality_code: int]
```
## How was this patch tested?
Pass the Jenkins with a new test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #14585 from dongjoon-hyun/SPARK-16975-PARQUET.
Diffstat (limited to 'sql/core/src/test/scala')
-rw-r--r-- | sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index eac588fff2..4fcde58833 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql +import java.io.File import java.math.MathContext import java.sql.{Date, Timestamp} @@ -2637,6 +2638,14 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-16975: Column-partition path starting '_' should be handled correctly") { + withTempDir { dir => + val parquetDir = new File(dir, "parquet").getCanonicalPath + spark.range(10).withColumn("_col", $"id").write.partitionBy("_col").save(parquetDir) + spark.read.parquet(parquetDir) + } + } + test("SPARK-16644: Aggregate should not put aggregate expressions to constraints") { withTable("tbl") { sql("CREATE TABLE tbl(a INT, b INT) USING parquet") |