[SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class

`HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers). Author: Cheng Lian <lian@databricks.com> Closes #6998 from liancheng/spark-8604 and squashes the following commits: 9be51d1 [Cheng Lian] Adds more comments 6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
author: Cheng Lian <lian@databricks.com> 2015-06-25 00:06:23 -0700
committer: Cheng Lian <lian@databricks.com> 2015-06-25 00:06:23 -0700
commit: c337844ed7f9b2cb7b217dc935183ef5e1096ca1 (patch)
tree: e6b7c881d0335fe9f2c3ec8de0b7fe48272107ea /sql/core
parent: 7bac2fe7717c0102b4875dbd95ae0bbf964536e3 (diff)
download: spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.gz
spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.bz2
spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.zip
1 files changed, 6 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
index 1d353bd8e1..bc39fae2bc 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
@@ -194,6 +194,12 @@ private[sql] class ParquetRelation2(
       committerClass,
       classOf[ParquetOutputCommitter])
 
+    // We're not really using `ParquetOutputFormat[Row]` for writing data here, because we override
+    // it in `ParquetOutputWriter` to support appending and dynamic partitioning.  The reason why
+    // we set it here is to setup the output committer class to `ParquetOutputCommitter`, which is
+    // bundled with `ParquetOutputFormat[Row]`.
+    job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])
+
     // TODO There's no need to use two kinds of WriteSupport
     // We should unify them. `SpecificMutableRow` can process both atomic (primitive) types and
     // complex types.
author	Cheng Lian <lian@databricks.com>	2015-06-25 00:06:23 -0700
committer	Cheng Lian <lian@databricks.com>	2015-06-25 00:06:23 -0700
commit	c337844ed7f9b2cb7b217dc935183ef5e1096ca1 (patch)
tree	e6b7c881d0335fe9f2c3ec8de0b7fe48272107ea /sql/core
parent	7bac2fe7717c0102b4875dbd95ae0bbf964536e3 (diff)
download	spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.gz spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.bz2 spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.zip