aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src/test/java
diff options
context:
space:
mode:
authorSean Zhong <seanzhong@databricks.com>2016-06-06 22:40:21 -0700
committerCheng Lian <lian@databricks.com>2016-06-06 22:40:21 -0700
commit0e0904a2fce3c4447c24f1752307b6d01ffbd0ad (patch)
tree75861951e2866cfb4788cce9b6ac0c4e3c7e4dbd /sql/core/src/test/java
parentc409e23abd128dad33557025f1e824ef47e6222f (diff)
downloadspark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.gz
spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.tar.bz2
spark-0e0904a2fce3c4447c24f1752307b6d01ffbd0ad.zip
[SPARK-15632][SQL] Typed Filter should NOT change the Dataset schema
## What changes were proposed in this pull request? This PR makes sure the typed Filter doesn't change the Dataset schema. **Before the change:** ``` scala> val df = spark.range(0,9) scala> df.schema res12: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false)) scala> val afterFilter = df.filter(_=>true) scala> afterFilter.schema // !!! schema is CHANGED!!! Column name is changed from id to value, nullable is changed from false to true. res13: org.apache.spark.sql.types.StructType = StructType(StructField(value,LongType,true)) ``` SerializeFromObject and DeserializeToObject are inserted to wrap the Filter, and these two can possibly change the schema of Dataset. **After the change:** ``` scala> afterFilter.schema // schema is NOT changed. res47: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false)) ``` ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #13529 from clockfly/spark-15632.
Diffstat (limited to 'sql/core/src/test/java')
-rw-r--r--sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java13
1 files changed, 13 insertions, 0 deletions
diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 8354a5bdac..37577accfd 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -92,6 +92,19 @@ public class JavaDatasetSuite implements Serializable {
Assert.assertFalse(iter.hasNext());
}
+ // SPARK-15632: typed filter should preserve the underlying logical schema
+ @Test
+ public void testTypedFilterPreservingSchema() {
+ Dataset<Long> ds = spark.range(10);
+ Dataset<Long> ds2 = ds.filter(new FilterFunction<Long>() {
+ @Override
+ public boolean call(Long value) throws Exception {
+ return value > 3;
+ }
+ });
+ Assert.assertEquals(ds.schema(), ds2.schema());
+ }
+
@Test
public void testCommonOperation() {
List<String> data = Arrays.asList("hello", "world");