[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB

Changed HDFS default block size from 64MB to 128MB. https://issues.apache.org/jira/browse/SPARK-18432 Author: Noritaka Sekiyama <moomindani@gmail.com> Closes #15879 from moomindani/SPARK-18432.
author: Noritaka Sekiyama <moomindani@gmail.com> 2016-11-14 21:07:59 +0900
committer: Kousuke Saruta <sarutak@oss.nttdata.co.jp> 2016-11-14 21:07:59 +0900
commit: 9d07ceee7860921eafb55b47852f1b51089c98da (patch)
tree: 67746b9f290891e9a128a7befa31ea5afb33779d /docs
parent: 637a0bb88f74712001f32a53ff66fd0b8cb67e4a (diff)
download: spark-9d07ceee7860921eafb55b47852f1b51089c98da.tar.gz
spark-9d07ceee7860921eafb55b47852f1b51089c98da.tar.bz2
spark-9d07ceee7860921eafb55b47852f1b51089c98da.zip
2 files changed, 5 insertions, 5 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index b9a2110b60..58bf17b4a8 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -343,7 +343,7 @@ Some notes on reading files with Spark:
 
 * All of Spark's file-based input methods, including `textFile`, support running on directories, compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`, `textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
 
-* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
 
 Apart from text files, Spark's Scala API also supports several other data formats:
 
@@ -375,7 +375,7 @@ Some notes on reading files with Spark:
 
 * All of Spark's file-based input methods, including `textFile`, support running on directories, compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`, `textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
 
-* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
 
 Apart from text files, Spark's Java API also supports several other data formats:
 
@@ -407,7 +407,7 @@ Some notes on reading files with Spark:
 
 * All of Spark's file-based input methods, including `textFile`, support running on directories, compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`, `textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
 
-* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
 
 Apart from text files, Spark's Python API also supports several other data formats:
 
diff --git a/docs/tuning.md b/docs/tuning.md
index 9c43b315bb..0de303a3bd 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -224,8 +224,8 @@ temporary objects created during task execution. Some steps which may be useful
 
 * As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using
   the size of the data block read from HDFS. Note that the size of a decompressed block is often 2 or 3 times the
-  size of the block. So if we wish to have 3 or 4 tasks' worth of working space, and the HDFS block size is 64 MB,
-  we can estimate size of Eden to be `4*3*64MB`.
+  size of the block. So if we wish to have 3 or 4 tasks' worth of working space, and the HDFS block size is 128 MB,
+  we can estimate size of Eden to be `4*3*128MB`.
 
 * Monitor how the frequency and time taken by garbage collection changes with the new settings.
author	Noritaka Sekiyama <moomindani@gmail.com>	2016-11-14 21:07:59 +0900
committer	Kousuke Saruta <sarutak@oss.nttdata.co.jp>	2016-11-14 21:07:59 +0900
commit	9d07ceee7860921eafb55b47852f1b51089c98da (patch)
tree	67746b9f290891e9a128a7befa31ea5afb33779d /docs
parent	637a0bb88f74712001f32a53ff66fd0b8cb67e4a (diff)
download	spark-9d07ceee7860921eafb55b47852f1b51089c98da.tar.gz spark-9d07ceee7860921eafb55b47852f1b51089c98da.tar.bz2 spark-9d07ceee7860921eafb55b47852f1b51089c98da.zip