diff options
author | Shixiong Zhu <shixiong@databricks.com> | 2017-01-27 15:07:57 -0800 |
---|---|---|
committer | Shixiong Zhu <shixiong@databricks.com> | 2017-01-27 15:07:57 -0800 |
commit | 21aa8c32ba7a29aafc000ecce2e6c802ced6a009 (patch) | |
tree | 27cbf9ae131d63e2632b255088f82509493efb96 /mllib/src | |
parent | a7ab6f9a8fdfb927f0bcefdc87a92cc82fac4223 (diff) | |
download | spark-21aa8c32ba7a29aafc000ecce2e6c802ced6a009.tar.gz spark-21aa8c32ba7a29aafc000ecce2e6c802ced6a009.tar.bz2 spark-21aa8c32ba7a29aafc000ecce2e6c802ced6a009.zip |
[SPARK-19365][CORE] Optimize RequestMessage serialization
## What changes were proposed in this pull request?
Right now Netty PRC serializes `RequestMessage` using Java serialization, and the size of a single message (e.g., RequestMessage(..., "hello")`) is almost 1KB.
This PR optimizes it by serializing `RequestMessage` manually (eliminate unnecessary information from most messages, e.g., class names of `RequestMessage`, `NettyRpcEndpointRef`, ...), and reduces the above message size to 100+ bytes.
## How was this patch tested?
Jenkins
I did a simple test to measure the improvement:
Before
```
$ bin/spark-shell --master local-cluster[1,4,1024]
...
scala> for (i <- 1 to 10) {
| val start = System.nanoTime
| val s = sc.parallelize(1 to 1000000, 10 * 1000).count()
| val end = System.nanoTime
| println(s"$i\t" + ((end - start)/1000/1000))
| }
1 6830
2 4353
3 3322
4 3107
5 3235
6 3139
7 3156
8 3166
9 3091
10 3029
```
After:
```
$ bin/spark-shell --master local-cluster[1,4,1024]
...
scala> for (i <- 1 to 10) {
| val start = System.nanoTime
| val s = sc.parallelize(1 to 1000000, 10 * 1000).count()
| val end = System.nanoTime
| println(s"$i\t" + ((end - start)/1000/1000))
| }
1 6431
2 3643
3 2913
4 2679
5 2760
6 2710
7 2747
8 2793
9 2679
10 2651
```
I also captured the TCP packets for this test. Before this patch, the total size of TCP packets is ~1.5GB. After it, it reduces to ~1.2GB.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #16706 from zsxwing/rpc-opt.
Diffstat (limited to 'mllib/src')
0 files changed, 0 insertions, 0 deletions