aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorDavies Liu <davies@databricks.com>2015-10-07 15:58:07 -0700
committerDavies Liu <davies.liu@gmail.com>2015-10-07 15:58:07 -0700
commit075a0b658289608c8732e07e26e14d736e673ce9 (patch)
tree91ab61c1f6cf7d9284c00f4e35037da7721c812a /mllib
parentdd36ec6bc5844aaa045a4bd9ba49113528e1740c (diff)
downloadspark-075a0b658289608c8732e07e26e14d736e673ce9.tar.gz
spark-075a0b658289608c8732e07e26e14d736e673ce9.tar.bz2
spark-075a0b658289608c8732e07e26e14d736e673ce9.zip
[SPARK-10917] [SQL] improve performance of complex type in columnar cache
This PR improve the performance of complex types in columnar cache by using UnsafeProjection instead of KryoSerializer. A simple benchmark show that this PR could improve the performance of scanning a cached table with complex columns by 15x (comparing to Spark 1.5). Here is the code used to benchmark: ``` df = sc.range(1<<23).map(lambda i: Row(a=Row(b=i, c=str(i)), d=range(10), e=dict(zip(range(10), [str(i) for i in range(10)])))).toDF() df.write.parquet("table") ``` ``` df = sqlContext.read.parquet("table") df.cache() df.count() t = time.time() print df.select("*")._jdf.queryExecution().toRdd().count() print time.time() - t ``` Author: Davies Liu <davies@databricks.com> Closes #8971 from davies/complex.
Diffstat (limited to 'mllib')
0 files changed, 0 insertions, 0 deletions