[SPARK-15632][SQL] Typed Filter should NOT change the Dataset schema

## What changes were proposed in this pull request? This PR makes sure the typed Filter doesn't change the Dataset schema. **Before the change:** ``` scala> val df = spark.range(0,9) scala> df.schema res12: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false)) scala> val afterFilter = df.filter(_=>true) scala> afterFilter.schema // !!! schema is CHANGED!!! Column name is changed from id to value, nullable is changed from false to true. res13: org.apache.spark.sql.types.StructType = StructType(StructField(value,LongType,true)) ``` SerializeFromObject and DeserializeToObject are inserted to wrap the Filter, and these two can possibly change the schema of Dataset. **After the change:** ``` scala> afterFilter.schema // schema is NOT changed. res47: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false)) ``` ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #13529 from clockfly/spark-15632.
author: Sean Zhong <seanzhong@databricks.com> 2016-06-06 22:40:21 -0700
committer: Cheng Lian <lian@databricks.com> 2016-06-06 22:40:21 -0700
commit: 0e0904a2fce3c4447c24f1752307b6d01ffbd0ad (patch)
tree: 75861951e2866cfb4788cce9b6ac0c4e3c7e4dbd /sql/core/src/test/java
parent: c409e23abd128dad33557025f1e824ef47e6222f (diff)
download: spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.gz
spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.bz2
spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.zip
1 files changed, 13 insertions, 0 deletions
diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 8354a5bdac..37577accfd 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -92,6 +92,19 @@ public class JavaDatasetSuite implements Serializable {
     Assert.assertFalse(iter.hasNext());
   }
 
+  // SPARK-15632: typed filter should preserve the underlying logical schema
+  @Test
+  public void testTypedFilterPreservingSchema() {
+    Dataset<Long> ds = spark.range(10);
+    Dataset<Long> ds2 = ds.filter(new FilterFunction<Long>() {
+      @Override
+      public boolean call(Long value) throws Exception {
+        return value > 3;
+      }
+    });
+    Assert.assertEquals(ds.schema(), ds2.schema());
+  }
+
   @Test
   public void testCommonOperation() {
     List<String> data = Arrays.asList("hello", "world");
author	Sean Zhong <seanzhong@databricks.com>	2016-06-06 22:40:21 -0700
committer	Cheng Lian <lian@databricks.com>	2016-06-06 22:40:21 -0700
commit	0e0904a2fce3c4447c24f1752307b6d01ffbd0ad (patch)
tree	75861951e2866cfb4788cce9b6ac0c4e3c7e4dbd /sql/core/src/test/java
parent	c409e23abd128dad33557025f1e824ef47e6222f (diff)
download	spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.gz spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.bz2 spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.zip