aboutsummaryrefslogtreecommitdiff
path: root/sql/core
diff options
context:
space:
mode:
authorCheng Lian <lian@databricks.com>2015-06-25 00:06:23 -0700
committerCheng Lian <lian@databricks.com>2015-06-25 00:06:23 -0700
commitc337844ed7f9b2cb7b217dc935183ef5e1096ca1 (patch)
treee6b7c881d0335fe9f2c3ec8de0b7fe48272107ea /sql/core
parent7bac2fe7717c0102b4875dbd95ae0bbf964536e3 (diff)
downloadspark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.gz
spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.tar.bz2
spark-c337844ed7f9b2cb7b217dc935183ef5e1096ca1.zip
[SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class
`HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers). Author: Cheng Lian <lian@databricks.com> Closes #6998 from liancheng/spark-8604 and squashes the following commits: 9be51d1 [Cheng Lian] Adds more comments 6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class
Diffstat (limited to 'sql/core')
-rw-r--r--sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala6
1 files changed, 6 insertions, 0 deletions
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
index 1d353bd8e1..bc39fae2bc 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
@@ -194,6 +194,12 @@ private[sql] class ParquetRelation2(
committerClass,
classOf[ParquetOutputCommitter])
+ // We're not really using `ParquetOutputFormat[Row]` for writing data here, because we override
+ // it in `ParquetOutputWriter` to support appending and dynamic partitioning. The reason why
+ // we set it here is to setup the output committer class to `ParquetOutputCommitter`, which is
+ // bundled with `ParquetOutputFormat[Row]`.
+ job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])
+
// TODO There's no need to use two kinds of WriteSupport
// We should unify them. `SpecificMutableRow` can process both atomic (primitive) types and
// complex types.