aboutsummaryrefslogtreecommitdiff
path: root/core
diff options
context:
space:
mode:
authorNong Li <nong@databricks.com>2016-01-12 18:21:04 -0800
committerReynold Xin <rxin@databricks.com>2016-01-12 18:21:04 -0800
commit9247084962259ebbbac4c5a80a6ccb271776f019 (patch)
treec36d488c2890ec74a9ab22bf6fa753a25e1f4e2f /core
parent4f60651cbec1b4c9cc2e6d832ace77e89a233f3a (diff)
downloadspark-9247084962259ebbbac4c5a80a6ccb271776f019.tar.gz
spark-9247084962259ebbbac4c5a80a6ccb271776f019.tar.bz2
spark-9247084962259ebbbac4c5a80a6ccb271776f019.zip
[SPARK-12785][SQL] Add ColumnarBatch, an in memory columnar format for execution.
There are many potential benefits of having an efficient in memory columnar format as an alternate to UnsafeRow. This patch introduces ColumnarBatch/ColumnarVector which starts this effort. The remaining implementation can be done as follow up patches. As stated in the in the JIRA, there are useful external components that operate on memory in a simple columnar format. ColumnarBatch would serve that purpose and could server as a zero-serialization/zero-copy exchange for this use case. This patch supports running the underlying data either on heap or off heap. On heap runs a bit faster but we would need offheap for zero-copy exchanges. Currently, this mode is hidden behind one interface (ColumnVector). This differs from Parquet or the existing columnar cache because this is *not* intended to be used as a storage format. The focus is entirely on CPU efficiency as we expect to only have 1 of these batches in memory per task. The layout of the values is just dense arrays of the value type. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10628 from nongli/spark-12635.
Diffstat (limited to 'core')
0 files changed, 0 insertions, 0 deletions