diff options
author | Andrew Ash <andrew@andrewash.com> | 2014-12-10 15:01:15 -0800 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-12-10 15:01:15 -0800 |
commit | 652b781a9b543cb17d7da91f5c3bebe5a02e0478 (patch) | |
tree | 1840a2dacc0226611a8c328e79021267345c7cd7 /docs/tuning.md | |
parent | 36bdb5b748ff670a9bafd787e40c9e142c9a85b9 (diff) | |
download | spark-652b781a9b543cb17d7da91f5c3bebe5a02e0478.tar.gz spark-652b781a9b543cb17d7da91f5c3bebe5a02e0478.tar.bz2 spark-652b781a9b543cb17d7da91f5c3bebe5a02e0478.zip |
SPARK-3526 Add section about data locality to the tuning guide
cc kayousterhout
I have a few outstanding questions from compiling this documentation:
- What's the difference between NO_PREF and ANY? I understand the implications of the ordering but don't know what an example of each would be
- Why is NO_PREF ahead of RACK_LOCAL? I would think it'd be better to schedule rack-local tasks ahead of no preference if you could only do one or the other. Is the idea to wait longer and hope for the rack-local tasks to turn into node-local or better?
- Will there be a datacenter-local locality level in the future? Apache Cassandra for example has this level
Author: Andrew Ash <andrew@andrewash.com>
Closes #2519 from ash211/SPARK-3526 and squashes the following commits:
44cff28 [Andrew Ash] Link to spark.locality parameters rather than copying the list
6d5d966 [Andrew Ash] Stay focused on Spark, no astronaut architecture mumbo-jumbo
20e0e31 [Andrew Ash] SPARK-3526 Add section about data locality to the tuning guide
Diffstat (limited to 'docs/tuning.md')
-rw-r--r-- | docs/tuning.md | 33 |
1 files changed, 33 insertions, 0 deletions
diff --git a/docs/tuning.md b/docs/tuning.md index 0e2447dd46..c4ca766328 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -233,6 +233,39 @@ Spark prints the serialized size of each task on the master, so you can look at decide whether your tasks are too large; in general tasks larger than about 20 KB are probably worth optimizing. +## Data Locality + +Data locality can have a major impact on the performance of Spark jobs. If data and the code that +operates on it are together than computation tends to be fast. But if code and data are separated, +one must move to the other. Typically it is faster to ship serialized code from place to place than +a chunk of data because code size is much smaller than data. Spark builds its scheduling around +this general principle of data locality. + +Data locality is how close data is to the code processing it. There are several levels of +locality based on the data's current location. In order from closest to farthest: + +- `PROCESS_LOCAL` data is in the same JVM as the running code. This is the best locality + possible +- `NODE_LOCAL` data is on the same node. Examples might be in HDFS on the same node, or in + another executor on the same node. This is a little slower than `PROCESS_LOCAL` because the data + has to travel between processes +- `NO_PREF` data is accessed equally quickly from anywhere and has no locality preference +- `RACK_LOCAL` data is on the same rack of servers. Data is on a different server on the same rack + so needs to be sent over the network, typically through a single switch +- `ANY` data is elsewhere on the network and not in the same rack + +Spark prefers to schedule all tasks at the best locality level, but this is not always possible. In +situations where there is no unprocessed data on any idle executor, Spark switches to lower locality +levels. There are two options: a) wait until a busy CPU frees up to start a task on data on the same +server, or b) immediately start a new task in a farther away place that requires moving data there. + +What Spark typically does is wait a bit in the hopes that a busy CPU frees up. Once that timeout +expires, it starts moving the data from far away to the free CPU. The wait timeout for fallback +between each level can be configured individually or all together in one parameter; see the +`spark.locality` parameters on the [configuration page](configuration.html#scheduling) for details. +You should increase these settings if your tasks are long and see poor locality, but the default +usually works well. + # Summary This has been a short guide to point out the main concerns you should know about when tuning a |