aboutsummaryrefslogtreecommitdiff
path: root/LICENSE
diff options
context:
space:
mode:
authorSandy Ryza <sandy@cloudera.com>2014-10-27 10:04:24 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-10-27 10:04:24 -0700
commitdea302ddbd26b1f20fb8a3979bd1d8e1717479f8 (patch)
treedc144e0947d86c1547e86a2fb117fd65af5a2983 /LICENSE
parentc9e05ca27c9c702b510d424e3befc87213f24e0f (diff)
downloadspark-dea302ddbd26b1f20fb8a3979bd1d8e1717479f8.tar.gz
spark-dea302ddbd26b1f20fb8a3979bd1d8e1717479f8.tar.bz2
spark-dea302ddbd26b1f20fb8a3979bd1d8e1717479f8.zip
SPARK-2621. Update task InputMetrics incrementally
The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read. It eliminates the old method, which naively accepts the split size as the input bytes. An impact of this change will be that input metrics go away when using against Hadoop versions earlier thatn 2.5. I can add this back in, but my opinion is that no metrics are better than inaccurate metrics. This is difficult to write a test for because we don't usually build against a version of Hadoop that contains the function we need. I've tested it manually on a pseudo-distributed cluster. Author: Sandy Ryza <sandy@cloudera.com> Closes #2087 from sryza/sandy-spark-2621 and squashes the following commits: 23010b8 [Sandy Ryza] Missing style fixes 74fc9bb [Sandy Ryza] Make getFSBytesReadOnThreadCallback private 1ab662d [Sandy Ryza] Clear things up a bit 984631f [Sandy Ryza] Switch from pull to push model and add test 7ef7b22 [Sandy Ryza] Add missing curly braces 219abc9 [Sandy Ryza] Fall back to split size 90dbc14 [Sandy Ryza] SPARK-2621. Update task InputMetrics incrementally
Diffstat (limited to 'LICENSE')
0 files changed, 0 insertions, 0 deletions