aboutsummaryrefslogtreecommitdiff
path: root/data
diff options
context:
space:
mode:
authorWenchen Fan <wenchen@databricks.com>2016-01-25 16:23:59 -0800
committerDavies Liu <davies.liu@gmail.com>2016-01-25 16:23:59 -0800
commitbe375fcbd200fb0e210b8edcfceb5a1bcdbba94b (patch)
tree060e087c33e27b44b30fb97f9861c97d8a5a06af /data
parent6f0f1d9e04a8db47e2f6f8fcfe9dea9de0f633da (diff)
downloadspark-be375fcbd200fb0e210b8edcfceb5a1bcdbba94b.tar.gz
spark-be375fcbd200fb0e210b8edcfceb5a1bcdbba94b.tar.bz2
spark-be375fcbd200fb0e210b8edcfceb5a1bcdbba94b.zip
[SPARK-12879] [SQL] improve the unsafe row writing framework
As we begin to use unsafe row writing framework(`BufferHolder` and `UnsafeRowWriter`) in more and more places(`UnsafeProjection`, `UnsafeRowParquetRecordReader`, `GenerateColumnAccessor`, etc.), we should add more doc to it and make it easier to use. This PR abstract the technique used in `UnsafeRowParquetRecordReader`: avoid unnecessary operatition as more as possible. For example, do not always point the row to the buffer at the end, we only need to update the size of row. If all fields are of primitive type, we can even save the row size updating. Then we can apply this technique to more places easily. a local benchmark shows `UnsafeProjection` is up to 1.7x faster after this PR: **old version** ``` Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz unsafe projection: Avg Time(ms) Avg Rate(M/s) Relative Rate ------------------------------------------------------------------------------- single long 2616.04 102.61 1.00 X single nullable long 3032.54 88.52 0.86 X primitive types 9121.05 29.43 0.29 X nullable primitive types 12410.60 21.63 0.21 X ``` **new version** ``` Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz unsafe projection: Avg Time(ms) Avg Rate(M/s) Relative Rate ------------------------------------------------------------------------------- single long 1533.34 175.07 1.00 X single nullable long 2306.73 116.37 0.66 X primitive types 8403.93 31.94 0.18 X nullable primitive types 12448.39 21.56 0.12 X ``` For single non-nullable long(the best case), we can have about 1.7x speed up. Even it's nullable, we can still have 1.3x speed up. For other cases, it's not such a boost as the saved operations only take a little proportion of the whole process. The benchmark code is included in this PR. Author: Wenchen Fan <wenchen@databricks.com> Closes #10809 from cloud-fan/unsafe-projection.
Diffstat (limited to 'data')
0 files changed, 0 insertions, 0 deletions