Merge pull request #533 from andrewor14/master. Closes #533.

External spilling - generalize batching logic The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way. Author: Andrew Or <andrewor14@gmail.com> == Merge branch commits == commit 3ddeb7ef89a0af2b685fb5d071aa0f71c975cc82 Author: Andrew Or <andrewor14@gmail.com> Date: Wed Feb 5 12:09:32 2014 -0800 Also privatize fields commit 090544a87a0767effd0c835a53952f72fc8d24f0 Author: Andrew Or <andrewor14@gmail.com> Date: Wed Feb 5 10:58:23 2014 -0800 Privatize methods commit 13920c918efe22e66a1760b14beceb17a61fd8cc Author: Andrew Or <andrewor14@gmail.com> Date: Tue Feb 4 16:34:15 2014 -0800 Update docs commit bd5a1d7350467ed3dc19c2de9b2c9f531f0e6aa3 Author: Andrew Or <andrewor14@gmail.com> Date: Tue Feb 4 13:44:24 2014 -0800 Typo: phyiscal -> physical commit 287ef44e593ad72f7434b759be3170d9ee2723d2 Author: Andrew Or <andrewor14@gmail.com> Date: Tue Feb 4 13:38:32 2014 -0800 Avoid reading the entire batch into memory; also simplify streaming logic Additionally, address formatting comments. commit 3df700509955f7074821e9aab1e74cb53c58b5a5 Merge: a531d2e 164489d Author: Andrew Or <andrewor14@gmail.com> Date: Mon Feb 3 18:27:49 2014 -0800 Merge branch 'master' of github.com:andrewor14/incubator-spark commit a531d2e347acdcecf2d0ab72cd4f965ab5e145d8 Author: Andrew Or <andrewor14@gmail.com> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF. commit 164489d6f176bdecfa9dabec2dfce5504d1ee8af Author: Andrew Or <andrewor14@gmail.com> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
author: Andrew Or <andrewor14@gmail.com> 2014-02-06 22:05:53 -0800
committer: Patrick Wendell <pwendell@gmail.com> 2014-02-06 22:05:53 -0800
commit: 1896c6e7c9f5c29284a045128b4aca0d5a6e7220 (patch)
tree: 4b1f9b2def4ce22032c956fb7bf86c08f84f551b /docs
parent: 0b448df6ac520a7977b1eb51e8c55e33f3fd2da8 (diff)
download: spark-1896c6e7c9f5c29284a045128b4aca0d5a6e7220.tar.gz
spark-1896c6e7c9f5c29284a045128b4aca0d5a6e7220.tar.bz2
spark-1896c6e7c9f5c29284a045128b4aca0d5a6e7220.zip
1 files changed, 1 insertions, 3 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index 1f9fa70566..8e4c48c81f 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -158,9 +158,7 @@ Apart from these, the following properties are also available, and may be useful
   <td>spark.shuffle.spill.compress</td>
   <td>true</td>
   <td>
-    Whether to compress data spilled during shuffles. If enabled, spill compression
-    always uses the `org.apache.spark.io.LZFCompressionCodec` codec, 
-    regardless of the value of `spark.io.compression.codec`.
+    Whether to compress data spilled during shuffles.
   </td>
 </tr>
 <tr>
author	Andrew Or <andrewor14@gmail.com>	2014-02-06 22:05:53 -0800
committer	Patrick Wendell <pwendell@gmail.com>	2014-02-06 22:05:53 -0800
commit	1896c6e7c9f5c29284a045128b4aca0d5a6e7220 (patch)
tree	4b1f9b2def4ce22032c956fb7bf86c08f84f551b /docs
parent	0b448df6ac520a7977b1eb51e8c55e33f3fd2da8 (diff)
download	spark-1896c6e7c9f5c29284a045128b4aca0d5a6e7220.tar.gz spark-1896c6e7c9f5c29284a045128b4aca0d5a6e7220.tar.bz2 spark-1896c6e7c9f5c29284a045128b4aca0d5a6e7220.zip