diff options
author | Cheng Lian <lian@databricks.com> | 2016-05-29 23:19:12 -0700 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-05-29 23:19:12 -0700 |
commit | 1360a6d636dd812a27955fc85df8e0255db60dfa (patch) | |
tree | b43895b8da141122252ec7edcb0491e6a9622b79 /core/src | |
parent | ce1572d16f03d383071bcc1f30ede551e8ded49f (diff) | |
download | spark-1360a6d636dd812a27955fc85df8e0255db60dfa.tar.gz spark-1360a6d636dd812a27955fc85df8e0255db60dfa.tar.bz2 spark-1360a6d636dd812a27955fc85df8e0255db60dfa.zip |
[SPARK-15112][SQL] Disables EmbedSerializerInFilter for plan fragments that change schema
## What changes were proposed in this pull request?
`EmbedSerializerInFilter` implicitly assumes that the plan fragment being optimized doesn't change plan schema, which is reasonable because `Dataset.filter` should never change the schema.
However, due to another issue involving `DeserializeToObject` and `SerializeFromObject`, typed filter *does* change plan schema (see [SPARK-15632][1]). This breaks `EmbedSerializerInFilter` and causes corrupted data.
This PR disables `EmbedSerializerInFilter` when there's a schema change to avoid data corruption. The schema change issue should be addressed in follow-up PRs.
## How was this patch tested?
New test case added in `DatasetSuite`.
[1]: https://issues.apache.org/jira/browse/SPARK-15632
Author: Cheng Lian <lian@databricks.com>
Closes #13362 from liancheng/spark-15112-corrupted-filter.
Diffstat (limited to 'core/src')
0 files changed, 0 insertions, 0 deletions