aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorNoritaka Sekiyama <moomindani@gmail.com>2016-11-14 21:07:59 +0900
committerKousuke Saruta <sarutak@oss.nttdata.co.jp>2016-11-14 21:07:59 +0900
commit9d07ceee7860921eafb55b47852f1b51089c98da (patch)
tree67746b9f290891e9a128a7befa31ea5afb33779d /docs
parent637a0bb88f74712001f32a53ff66fd0b8cb67e4a (diff)
downloadspark-9d07ceee7860921eafb55b47852f1b51089c98da.tar.gz
spark-9d07ceee7860921eafb55b47852f1b51089c98da.tar.bz2
spark-9d07ceee7860921eafb55b47852f1b51089c98da.zip
[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB
Changed HDFS default block size from 64MB to 128MB. https://issues.apache.org/jira/browse/SPARK-18432 Author: Noritaka Sekiyama <moomindani@gmail.com> Closes #15879 from moomindani/SPARK-18432.
Diffstat (limited to 'docs')
-rw-r--r--docs/programming-guide.md6
-rw-r--r--docs/tuning.md4
2 files changed, 5 insertions, 5 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index b9a2110b60..58bf17b4a8 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -343,7 +343,7 @@ Some notes on reading files with Spark:
* All of Spark's file-based input methods, including `textFile`, support running on directories, compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`, `textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
-* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
Apart from text files, Spark's Scala API also supports several other data formats:
@@ -375,7 +375,7 @@ Some notes on reading files with Spark:
* All of Spark's file-based input methods, including `textFile`, support running on directories, compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`, `textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
-* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
Apart from text files, Spark's Java API also supports several other data formats:
@@ -407,7 +407,7 @@ Some notes on reading files with Spark:
* All of Spark's file-based input methods, including `textFile`, support running on directories, compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`, `textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
-* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number of partitions of the file. By default, Spark creates one partition for each block of the file (blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions by passing a larger value. Note that you cannot have fewer partitions than blocks.
Apart from text files, Spark's Python API also supports several other data formats:
diff --git a/docs/tuning.md b/docs/tuning.md
index 9c43b315bb..0de303a3bd 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -224,8 +224,8 @@ temporary objects created during task execution. Some steps which may be useful
* As an example, if your task is reading data from HDFS, the amount of memory used by the task can be estimated using
the size of the data block read from HDFS. Note that the size of a decompressed block is often 2 or 3 times the
- size of the block. So if we wish to have 3 or 4 tasks' worth of working space, and the HDFS block size is 64 MB,
- we can estimate size of Eden to be `4*3*64MB`.
+ size of the block. So if we wish to have 3 or 4 tasks' worth of working space, and the HDFS block size is 128 MB,
+ we can estimate size of Eden to be `4*3*128MB`.
* Monitor how the frequency and time taken by garbage collection changes with the new settings.