diff options
author | Davies Liu <davies@databricks.com> | 2015-10-07 15:58:07 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2015-10-07 15:58:07 -0700 |
commit | 075a0b658289608c8732e07e26e14d736e673ce9 (patch) | |
tree | 91ab61c1f6cf7d9284c00f4e35037da7721c812a /pom.xml | |
parent | dd36ec6bc5844aaa045a4bd9ba49113528e1740c (diff) | |
download | spark-075a0b658289608c8732e07e26e14d736e673ce9.tar.gz spark-075a0b658289608c8732e07e26e14d736e673ce9.tar.bz2 spark-075a0b658289608c8732e07e26e14d736e673ce9.zip |
[SPARK-10917] [SQL] improve performance of complex type in columnar cache
This PR improve the performance of complex types in columnar cache by using UnsafeProjection instead of KryoSerializer.
A simple benchmark show that this PR could improve the performance of scanning a cached table with complex columns by 15x (comparing to Spark 1.5).
Here is the code used to benchmark:
```
df = sc.range(1<<23).map(lambda i: Row(a=Row(b=i, c=str(i)), d=range(10), e=dict(zip(range(10), [str(i) for i in range(10)])))).toDF()
df.write.parquet("table")
```
```
df = sqlContext.read.parquet("table")
df.cache()
df.count()
t = time.time()
print df.select("*")._jdf.queryExecution().toRdd().count()
print time.time() - t
```
Author: Davies Liu <davies@databricks.com>
Closes #8971 from davies/complex.
Diffstat (limited to 'pom.xml')
0 files changed, 0 insertions, 0 deletions