[SPARK-20107][DOC] Add spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to configuration.md

## What changes were proposed in this pull request? Add `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` option to `configuration.md`. Set `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2` can speed up [HadoopMapReduceCommitProtocol.commitJob](https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L121) for many output files. All cloudera's hadoop 2.6.0-cdh5.4.0 or higher versions(see: https://github.com/cloudera/hadoop-common/commit/1c1236182304d4075276c00c4592358f428bc433 and https://github.com/cloudera/hadoop-common/commit/16b2de27321db7ce2395c08baccfdec5562017f0) and apache's hadoop 2.7.0 or higher versions support this improvement. More see: 1. [MAPREDUCE-4815](https://issues.apache.org/jira/browse/MAPREDUCE-4815): Speed up FileOutputCommitter#commitJob for many output files. 2. [MAPREDUCE-6406](https://issues.apache.org/jira/browse/MAPREDUCE-6406): Update the default version for the property mapreduce.fileoutputcommitter.algorithm.version to 2. ## How was this patch tested? Manual test and exist tests. Author: Yuming Wang <wgyumg@gmail.com> Closes #17442 from wangyum/SPARK-20107.
author: Yuming Wang <wgyumg@gmail.com> 2017-03-30 10:39:57 +0100
committer: Sean Owen <sowen@cloudera.com> 2017-03-30 10:39:57 +0100
commit: edc87d76efea7b4d19d9d0c4ddba274a3ccb8752 (patch)
tree: cfb3d2fd12ff1ead252780e831510610e78d5de1 /docs/configuration.md
parent: 471de5db53ed77711523a3f016d6e9c530b651e5 (diff)
download: spark-edc87d76efea7b4d19d9d0c4ddba274a3ccb8752.tar.gz
spark-edc87d76efea7b4d19d9d0c4ddba274a3ccb8752.tar.bz2
spark-edc87d76efea7b4d19d9d0c4ddba274a3ccb8752.zip
1 files changed, 9 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 4729f1b040..a975392540 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1137,6 +1137,15 @@ Apart from these, the following properties are also available, and may be useful
     mapping has high overhead for blocks close to or below the page size of the operating system.
   </td>
 </tr>
+<tr>
+  <td><code>spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version</code></td>
+  <td>1</td>
+  <td>
+    The file output committer algorithm version, valid algorithm version number: 1 or 2.
+    Version 2 may have better performance, but version 1 may handle failures better in certain situations,
+    as per <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4815">MAPREDUCE-4815</a>.
+  </td>
+</tr>
 </table>
 
 ### Networking
author	Yuming Wang <wgyumg@gmail.com>	2017-03-30 10:39:57 +0100
committer	Sean Owen <sowen@cloudera.com>	2017-03-30 10:39:57 +0100
commit	edc87d76efea7b4d19d9d0c4ddba274a3ccb8752 (patch)
tree	cfb3d2fd12ff1ead252780e831510610e78d5de1 /docs/configuration.md
parent	471de5db53ed77711523a3f016d6e9c530b651e5 (diff)
download	spark-edc87d76efea7b4d19d9d0c4ddba274a3ccb8752.tar.gz spark-edc87d76efea7b4d19d9d0c4ddba274a3ccb8752.tar.bz2 spark-edc87d76efea7b4d19d9d0c4ddba274a3ccb8752.zip