diff options
author | Sean Zhong <seanzhong@databricks.com> | 2016-06-06 22:40:21 -0700 |
---|---|---|
committer | Cheng Lian <lian@databricks.com> | 2016-06-06 22:40:21 -0700 |
commit | 0e0904a2fce3c4447c24f1752307b6d01ffbd0ad (patch) | |
tree | 75861951e2866cfb4788cce9b6ac0c4e3c7e4dbd /sql/core/src/test/java | |
parent | c409e23abd128dad33557025f1e824ef47e6222f (diff) | |
download | spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.gz spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.bz2 spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.zip |
[SPARK-15632][SQL] Typed Filter should NOT change the Dataset schema
## What changes were proposed in this pull request?
This PR makes sure the typed Filter doesn't change the Dataset schema.
**Before the change:**
```
scala> val df = spark.range(0,9)
scala> df.schema
res12: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))
scala> val afterFilter = df.filter(_=>true)
scala> afterFilter.schema // !!! schema is CHANGED!!! Column name is changed from id to value, nullable is changed from false to true.
res13: org.apache.spark.sql.types.StructType = StructType(StructField(value,LongType,true))
```
SerializeFromObject and DeserializeToObject are inserted to wrap the Filter, and these two can possibly change the schema of Dataset.
**After the change:**
```
scala> afterFilter.schema // schema is NOT changed.
res47: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false))
```
## How was this patch tested?
Unit test.
Author: Sean Zhong <seanzhong@databricks.com>
Closes #13529 from clockfly/spark-15632.
Diffstat (limited to 'sql/core/src/test/java')
-rw-r--r-- | sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java index 8354a5bdac..37577accfd 100644 --- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java +++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java @@ -92,6 +92,19 @@ public class JavaDatasetSuite implements Serializable { Assert.assertFalse(iter.hasNext()); } + // SPARK-15632: typed filter should preserve the underlying logical schema + @Test + public void testTypedFilterPreservingSchema() { + Dataset<Long> ds = spark.range(10); + Dataset<Long> ds2 = ds.filter(new FilterFunction<Long>() { + @Override + public boolean call(Long value) throws Exception { + return value > 3; + } + }); + Assert.assertEquals(ds.schema(), ds2.schema()); + } + @Test public void testCommonOperation() { List<String> data = Arrays.asList("hello", "world"); |