diff options
author | Joseph Batchik <joseph.batchik@cloudera.com> | 2015-07-29 14:02:32 -0500 |
---|---|---|
committer | Imran Rashid <irashid@cloudera.com> | 2015-07-29 14:02:32 -0500 |
commit | 069a4c414db4612d7bdb6f5615c1ba36998e5a49 (patch) | |
tree | 7286954505ecaa1adf56b9337f9fe65746a04a33 /streaming | |
parent | 97906944e133dec13068f16520b6abbcdc79e84f (diff) | |
download | spark-069a4c414db4612d7bdb6f5615c1ba36998e5a49.tar.gz spark-069a4c414db4612d7bdb6f5615c1ba36998e5a49.tar.bz2 spark-069a4c414db4612d7bdb6f5615c1ba36998e5a49.zip |
[SPARK-746] [CORE] Added Avro Serialization to Kryo
Added a custom Kryo serializer for generic Avro records to reduce the network IO
involved during a shuffle. This compresses the schema and allows for users to
register their schemas ahead of time to further reduce traffic.
Currently Kryo tries to use its default serializer for generic Records, which will include
a lot of unneeded data in each record.
Author: Joseph Batchik <joseph.batchik@cloudera.com>
Author: Joseph Batchik <josephbatchik@gmail.com>
Closes #7004 from JDrit/Avro_serialization and squashes the following commits:
8158d51 [Joseph Batchik] updated per feedback
c0cf329 [Joseph Batchik] implemented @squito suggestion for SparkEnv
dd71efe [Joseph Batchik] fixed bug with serializing
1183a48 [Joseph Batchik] updated codec settings
fa9298b [Joseph Batchik] forgot a couple of fixes
c5fe794 [Joseph Batchik] implemented @squito suggestion
0f5471a [Joseph Batchik] implemented @squito suggestion to use a codec that is already in spark
6d1925c [Joseph Batchik] fixed to changes suggested by @squito
d421bf5 [Joseph Batchik] updated pom to removed versions
ab46d10 [Joseph Batchik] Changed Avro dependency to be similar to parent
f4ae251 [Joseph Batchik] fixed serialization error in that SparkConf cannot be serialized
2b545cc [Joseph Batchik] started working on fixes for pr
97fba62 [Joseph Batchik] Added a custom Kryo serializer for generic Avro records to reduce the network IO involved during a shuffle. This compresses the schema and allows for users to register their schemas ahead of time to further reduce traffic.
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions