aboutsummaryrefslogtreecommitdiff
path: root/mllib/src
diff options
context:
space:
mode:
authorShixiong Zhu <shixiong@databricks.com>2017-01-27 15:07:57 -0800
committerShixiong Zhu <shixiong@databricks.com>2017-01-27 15:07:57 -0800
commit21aa8c32ba7a29aafc000ecce2e6c802ced6a009 (patch)
tree27cbf9ae131d63e2632b255088f82509493efb96 /mllib/src
parenta7ab6f9a8fdfb927f0bcefdc87a92cc82fac4223 (diff)
downloadspark-21aa8c32ba7a29aafc000ecce2e6c802ced6a009.tar.gz
spark-21aa8c32ba7a29aafc000ecce2e6c802ced6a009.tar.bz2
spark-21aa8c32ba7a29aafc000ecce2e6c802ced6a009.zip
[SPARK-19365][CORE] Optimize RequestMessage serialization
## What changes were proposed in this pull request? Right now Netty PRC serializes `RequestMessage` using Java serialization, and the size of a single message (e.g., RequestMessage(..., "hello")`) is almost 1KB. This PR optimizes it by serializing `RequestMessage` manually (eliminate unnecessary information from most messages, e.g., class names of `RequestMessage`, `NettyRpcEndpointRef`, ...), and reduces the above message size to 100+ bytes. ## How was this patch tested? Jenkins I did a simple test to measure the improvement: Before ``` $ bin/spark-shell --master local-cluster[1,4,1024] ... scala> for (i <- 1 to 10) { | val start = System.nanoTime | val s = sc.parallelize(1 to 1000000, 10 * 1000).count() | val end = System.nanoTime | println(s"$i\t" + ((end - start)/1000/1000)) | } 1 6830 2 4353 3 3322 4 3107 5 3235 6 3139 7 3156 8 3166 9 3091 10 3029 ``` After: ``` $ bin/spark-shell --master local-cluster[1,4,1024] ... scala> for (i <- 1 to 10) { | val start = System.nanoTime | val s = sc.parallelize(1 to 1000000, 10 * 1000).count() | val end = System.nanoTime | println(s"$i\t" + ((end - start)/1000/1000)) | } 1 6431 2 3643 3 2913 4 2679 5 2760 6 2710 7 2747 8 2793 9 2679 10 2651 ``` I also captured the TCP packets for this test. Before this patch, the total size of TCP packets is ~1.5GB. After it, it reduces to ~1.2GB. Author: Shixiong Zhu <shixiong@databricks.com> Closes #16706 from zsxwing/rpc-opt.
Diffstat (limited to 'mllib/src')
0 files changed, 0 insertions, 0 deletions