aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorAndrew Or <andrew@databricks.com>2015-03-02 16:34:32 -0800
committerPatrick Wendell <patrick@databricks.com>2015-03-02 16:34:32 -0800
commit6776cb33ea691f7843b956b3e80979282967e826 (patch)
tree8d8faa5a2fc5cf7c882320ae997632cad833a243 /docs
parent1a49496b4a9df40c74739fc0fb8a21c88a477075 (diff)
downloadspark-6776cb33ea691f7843b956b3e80979282967e826.tar.gz
spark-6776cb33ea691f7843b956b3e80979282967e826.tar.bz2
spark-6776cb33ea691f7843b956b3e80979282967e826.zip
[SPARK-6066] Make event log format easier to parse
Some users have reported difficulty in parsing the new event log format. Since we embed the metadata in the beginning of the file, when we compress the event log we need to skip the metadata because we need that information to parse the log later. This means we'll end up with a partially compressed file if event logging compression is turned on. The old format looks like: ``` sparkVersion = 1.3.0 compressionCodec = org.apache.spark.io.LZFCompressionCodec === LOG_HEADER_END === // actual events, could be compressed bytes ``` The new format in this patch puts the compression codec in the log file name instead. It also removes the metadata header altogether along with the Spark version, which was not needed. The new file name looks something like: ``` app_without_compression app_123.lzf app_456.snappy ``` I tested this with and without compression, using different compression codecs and event logging directories. I verified that both the `Master` and the `HistoryServer` can render both compressed and uncompressed logs as before. Author: Andrew Or <andrew@databricks.com> Closes #4821 from andrewor14/event-log-format and squashes the following commits: 8511141 [Andrew Or] Fix test 654883d [Andrew Or] Add back metadata with Spark version 7f537cd [Andrew Or] Address review feedback 7d6aa61 [Andrew Or] Make codec an extension 59abee9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format 27c9a6c [Andrew Or] Address review feedback 519e51a [Andrew Or] Address review feedback ef69276 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format 88a091d [Andrew Or] Add tests for new format and file name f32d8d2 [Andrew Or] Fix tests 8db5a06 [Andrew Or] Embed metadata in the event log file name instead
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions