diff options
author | Nathan Howell <nhowell@godaddy.com> | 2015-08-05 22:16:56 +0800 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2015-08-05 22:16:56 +0800 |
commit | eb8bfa3eaa0846d685e4d12f9ee2e4273b85edcf (patch) | |
tree | dca086b49bd186f709377fe905de613ec0dcc777 /sql | |
parent | 70112ff22bd1aee7689c5d3af9b66c9b8ceb3ec3 (diff) | |
download | spark-eb8bfa3eaa0846d685e4d12f9ee2e4273b85edcf.tar.gz spark-eb8bfa3eaa0846d685e4d12f9ee2e4273b85edcf.tar.bz2 spark-eb8bfa3eaa0846d685e4d12f9ee2e4273b85edcf.zip |
[SPARK-9618] [SQL] Use the specified schema when reading Parquet files
The user specified schema is currently ignored when loading Parquet files.
One workaround is to use the `format` and `load` methods instead of `parquet`, e.g.:
```
val schema = ???
// schema is ignored
sqlContext.read.schema(schema).parquet("hdfs:///test")
// schema is retained
sqlContext.read.schema(schema).format("parquet").load("hdfs:///test")
```
The fix is simple, but I wonder if the `parquet` method should instead be written in a similar fashion to `orc`:
```
def parquet(path: String): DataFrame = format("parquet").load(path)
```
Author: Nathan Howell <nhowell@godaddy.com>
Closes #7947 from NathanHowell/SPARK-9618 and squashes the following commits:
d1ea62c [Nathan Howell] [SPARK-9618] [SQL] Use the specified schema when reading Parquet files
Diffstat (limited to 'sql')
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala index eb09807f9d..b90de8ef09 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala @@ -260,7 +260,7 @@ class DataFrameReader private[sql](sqlContext: SQLContext) extends Logging { sqlContext.baseRelationToDataFrame( new ParquetRelation( - globbedPaths.map(_.toString), None, None, extraOptions.toMap)(sqlContext)) + globbedPaths.map(_.toString), userSpecifiedSchema, None, extraOptions.toMap)(sqlContext)) } } |