diff options
author | Josh Rosen <joshrosen@databricks.com> | 2014-10-21 00:49:11 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2014-10-21 00:49:11 -0700 |
commit | 5a8f64f33632fbf89d16cade2e0e66c5ed60760b (patch) | |
tree | 639e5c45fb9bafb7ab9bcc52147dfa600909cf3e /docs/mllib-collaborative-filtering.md | |
parent | 342b57db66e379c475daf5399baf680ff42b87c2 (diff) | |
download | spark-5a8f64f33632fbf89d16cade2e0e66c5ed60760b.tar.gz spark-5a8f64f33632fbf89d16cade2e0e66c5ed60760b.tar.bz2 spark-5a8f64f33632fbf89d16cade2e0e66c5ed60760b.zip |
[SPARK-3958] TorrentBroadcast cleanup / debugging improvements.
This PR makes several changes to TorrentBroadcast in order to make
it easier to reason about, which should help when debugging SPARK-3958.
The key changes:
- Remove all state from the global TorrentBroadcast object. This state
consisted mainly of configuration options, like the block size and
compression codec, and was read by the blockify / unblockify methods.
Unfortunately, the use of `lazy val` for `BLOCK_SIZE` meant that the block
size was always determined by the first SparkConf that TorrentBroadast was
initialized with; as a result, unit tests could not properly test
TorrentBroadcast with different block sizes.
Instead, blockifyObject and unBlockifyObject now accept compression codecs
and blockSizes as arguments. These arguments are supplied at the call sites
inside of TorrentBroadcast instances. Each TorrentBroadcast instance
determines these values from SparkEnv's SparkConf. I was careful to ensure
that we do not accidentally serialize CompressionCodec or SparkConf objects
as part of the TorrentBroadcast object.
- Remove special-case handling of local-mode in TorrentBroadcast. I don't
think that broadcast implementations should know about whether we're running
in local mode. If we want to optimize the performance of broadcast in local
mode, then we should detect this at a higher level and use a dummy
LocalBroadcastFactory implementation instead.
Removing this code fixes a subtle error condition: in the old local mode
code, a failure to find the broadcast in the local BlockManager would lead
to an attempt to deblockify zero blocks, which could lead to confusing
deserialization or decompression errors when we attempted to decompress
an empty byte array. This should never have happened, though: a failure to
find the block in local mode is evidence of some other error. The changes
here will make it easier to debug those errors if they ever happen.
- Add a check that throws an exception when attempting to deblockify an
empty array.
- Use ScalaCheck to add a test to check that TorrentBroadcast's
blockifyObject and unBlockifyObject methods are inverses.
- Misc. cleanup and logging improvements.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #2844 from JoshRosen/torrentbroadcast-bugfix and squashes the following commits:
1e8268d [Josh Rosen] Address Reynold's review comments
2a9fdfd [Josh Rosen] Address Reynold's review comments.
c3b08f9 [Josh Rosen] Update TorrentBroadcast tests to reflect removal of special local-mode optimizations.
5c22782 [Josh Rosen] Store broadcast variable's value in the driver.
33fc754 [Josh Rosen] Change blockify/unblockifyObject to accept serializer as argument.
618a872 [Josh Rosen] [SPARK-3958] TorrentBroadcast cleanup / debugging improvements.
Diffstat (limited to 'docs/mllib-collaborative-filtering.md')
0 files changed, 0 insertions, 0 deletions