SPARK-2621. Update task InputMetrics incrementally - spark

diff options

author	Sandy Ryza <sandy@cloudera.com>	2014-10-27 10:04:24 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2014-10-27 10:04:24 -0700
commit	dea302ddbd26b1f20fb8a3979bd1d8e1717479f8 (patch)
tree	dc144e0947d86c1547e86a2fb117fd65af5a2983 /LICENSE
parent	c9e05ca27c9c702b510d424e3befc87213f24e0f (diff)
download	spark-dea302ddbd26b1f20fb8a3979bd1d8e1717479f8.tar.gz spark-dea302ddbd26b1f20fb8a3979bd1d8e1717479f8.tar.bz2 spark-dea302ddbd26b1f20fb8a3979bd1d8e1717479f8.zip

SPARK-2621. Update task InputMetrics incrementally

The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read. It eliminates the old method, which naively accepts the split size as the input bytes. An impact of this change will be that input metrics go away when using against Hadoop versions earlier thatn 2.5. I can add this back in, but my opinion is that no metrics are better than inaccurate metrics. This is difficult to write a test for because we don't usually build against a version of Hadoop that contains the function we need. I've tested it manually on a pseudo-distributed cluster. Author: Sandy Ryza <sandy@cloudera.com> Closes #2087 from sryza/sandy-spark-2621 and squashes the following commits: 23010b8 [Sandy Ryza] Missing style fixes 74fc9bb [Sandy Ryza] Make getFSBytesReadOnThreadCallback private 1ab662d [Sandy Ryza] Clear things up a bit 984631f [Sandy Ryza] Switch from pull to push model and add test 7ef7b22 [Sandy Ryza] Add missing curly braces 219abc9 [Sandy Ryza] Fall back to split size 90dbc14 [Sandy Ryza] SPARK-2621. Update task InputMetrics incrementally

Diffstat (limited to 'LICENSE')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: