diff options
author | Cheng Lian <lian@databricks.com> | 2015-06-25 00:06:23 -0700 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2015-06-25 00:06:23 -0700 |
commit | c337844ed7f9b2cb7b217dc935183ef5e1096ca1 (patch) | |
tree | e6b7c881d0335fe9f2c3ec8de0b7fe48272107ea /sql/core | |
parent | 7bac2fe7717c0102b4875dbd95ae0bbf964536e3 (diff) | |
download | spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.gz spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.bz2 spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.zip |
[SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class
`HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).
Author: Cheng Lian <lian@databricks.com>
Closes #6998 from liancheng/spark-8604 and squashes the following commits:
9be51d1 [Cheng Lian] Adds more comments
6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
Diffstat (limited to 'sql/core')
-rw-r--r-- | sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala index 1d353bd8e1..bc39fae2bc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala @@ -194,6 +194,12 @@ private[sql] class ParquetRelation2( committerClass, classOf[ParquetOutputCommitter]) + // We're not really using `ParquetOutputFormat[Row]` for writing data here, because we override + // it in `ParquetOutputWriter` to support appending and dynamic partitioning. The reason why + // we set it here is to setup the output committer class to `ParquetOutputCommitter`, which is + // bundled with `ParquetOutputFormat[Row]`. + job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]]) + // TODO There's no need to use two kinds of WriteSupport // We should unify them. `SpecificMutableRow` can process both atomic (primitive) types and // complex types. |