aboutsummaryrefslogtreecommitdiff
path: root/docs/tuning.md
diff options
context:
space:
mode:
authorMarcelo Vanzin <vanzin@cloudera.com>2014-12-19 18:21:15 -0800
committerAndrew Or <andrew@databricks.com>2014-12-19 18:23:42 -0800
commit456451911d11cc0b6738f31b1e17869b1fb51c87 (patch)
tree65df1cdc3593000b3be1560a973e447e245d9b2e /docs/tuning.md
parentc28083f468685d468c4daa5ce336ef04fbec3144 (diff)
downloadspark-456451911d11cc0b6738f31b1e17869b1fb51c87.tar.gz
spark-456451911d11cc0b6738f31b1e17869b1fb51c87.tar.bz2
spark-456451911d11cc0b6738f31b1e17869b1fb51c87.zip
[SPARK-2261] Make event logger use a single file.
Currently the event logger uses a directory and several files to describe an app's event log, all but one of which are empty. This is not very HDFS-friendly, since creating lots of nodes in HDFS (especially when they don't contain any data) is frowned upon due to the node metadata being kept in the NameNode's memory. Instead, add a header section to the event log file that contains metadata needed to read the events. This metadata includes things like the Spark version (for future code that may need it for backwards compatibility) and the compression codec used for the event data. With the new approach, aside from reducing the load on the NN, there's also a lot less remote calls needed when reading the log directory. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1222 from vanzin/hist-server-single-log and squashes the following commits: cc8f5de [Marcelo Vanzin] Store header in plain text. c7e6123 [Marcelo Vanzin] Update comment. 59c561c [Marcelo Vanzin] Review feedback. 216c5a3 [Marcelo Vanzin] Review comments. dce28e9 [Marcelo Vanzin] Fix log overwrite test. f91c13e [Marcelo Vanzin] Handle "spark.eventLog.overwrite", and add unit test. 346f0b4 [Marcelo Vanzin] Review feedback. ed0023e [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 3f4500f [Marcelo Vanzin] Unit test for SPARK-3697. 45c7a1f [Marcelo Vanzin] Version of SPARK-3697 for this branch. b3ee30b [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log a6d5c50 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 16fd491 [Marcelo Vanzin] Use unique log directory for each codec. 0ef3f70 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log d93c44a [Marcelo Vanzin] Add a newline to make the header more readable. 9e928ba [Marcelo Vanzin] Add types. bd6ba8c [Marcelo Vanzin] Review feedback. a624a89 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 04364dc [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log bb7c2d3 [Marcelo Vanzin] Fix scalastyle warning. 16661a3 [Marcelo Vanzin] Simplify some internal code. cc6bce4 [Marcelo Vanzin] Some review feedback. a722184 [Marcelo Vanzin] Do not encode metadata in log file name. 3700586 [Marcelo Vanzin] Restore log flushing. f677930 [Marcelo Vanzin] Fix botched rebase. ae571fa [Marcelo Vanzin] Fix end-to-end event logger test. 9db0efd [Marcelo Vanzin] Show prettier name in UI. 8f42274 [Marcelo Vanzin] Make history server parse old-style log directories. 6251dd7 [Marcelo Vanzin] Make event logger use a single file.
Diffstat (limited to 'docs/tuning.md')
0 files changed, 0 insertions, 0 deletions