aboutsummaryrefslogtreecommitdiff
path: root/docs/configuration.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@eecs.berkeley.edu>2012-10-07 11:30:53 -0700
committerMatei Zaharia <matei@eecs.berkeley.edu>2012-10-07 11:30:53 -0700
commitefc5423210d1aadeaea78273a4a8f10425753079 (patch)
tree86e0ca94f41ef5a17b92be3d7be45b77f762e1f8 /docs/configuration.md
parent039cc6228e92b3ee7a05ebbbe4b915be2c1db1f3 (diff)
downloadspark-efc5423210d1aadeaea78273a4a8f10425753079.tar.gz
spark-efc5423210d1aadeaea78273a4a8f10425753079.tar.bz2
spark-efc5423210d1aadeaea78273a4a8f10425753079.zip
Made compression configurable separately for shuffle, broadcast and RDDs
Diffstat (limited to 'docs/configuration.md')
-rw-r--r--docs/configuration.md41
1 files changed, 27 insertions, 14 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 0987f7f7b1..db90b5bc16 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -113,29 +113,34 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
- <td>spark.blockManager.compress</td>
- <td>false</td>
+ <td>spark.storage.memoryFraction</td>
+ <td>0.66</td>
<td>
- Set to "true" to have Spark compress map output files, RDDs that get cached on disk,
- and RDDs that get cached in serialized form. Generally a good idea when dealing with
- large datasets, but might add some CPU overhead.
+ Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old"
+ generation of objects in the JVM, which by default is given 2/3 of the heap, but you can increase
+ it if you configure your own old generation size.
+ </td>
+</tr>
+<tr>
+ <td>spark.shuffle.compress</td>
+ <td>true</td>
+ <td>
+ Whether to compress map output files. Generally a good idea.
</td>
</tr>
<tr>
<td>spark.broadcast.compress</td>
- <td>false</td>
+ <td>true</td>
<td>
- Set to "true" to have Spark compress broadcast variables before sending them.
- Generally a good idea when broadcasting large values.
+ Whether to compress broadcast variables before sending them. Generally a good idea.
</td>
</tr>
<tr>
- <td>spark.storage.memoryFraction</td>
- <td>0.66</td>
+ <td>spark.rdd.compress</td>
+ <td>false</td>
<td>
- Fraction of Java heap to use for Spark's memory cache. This should not be larger than the "old"
- generation of objects in the JVM, which by default is given 2/3 of the heap, but you can increase
- it if you configure your own old generation size.
+ Whether to compress serialized RDD partitions (e.g. for <code>StorageLevel.MEMORY_ONLY_SER</code>).
+ Can save substantial space at the cost of some extra CPU time.
</td>
</tr>
<tr>
@@ -181,10 +186,18 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
+ <td>spark.akka.threads</td>
+ <td>4</td>
+ <td>
+ Number of actor threads to use for communication. Can be useful to increase on large clusters
+ when the master has a lot of CPU cores.
+ </td>
+</tr>
+<tr>
<td>spark.master.host</td>
<td>(local hostname)</td>
<td>
- Hostname for the master to listen on (it will bind to this hostname's IP address).
+ Hostname or IP address for the master to listen on.
</td>
</tr>
<tr>