aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src/test/java
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-01-20 14:59:30 -0800
committerReynold Xin <rxin@databricks.com>2016-01-20 14:59:30 -0800
commit8f90c151878571e20625e2a53561441ec0035dfc (patch)
treeb9b4354468e5e2f220c14ac520a960c94e0274b5 /sql/core/src/test/java
parentb7d74a602f622d8e105b349bd6d17ba42e7668dc (diff)
downloadspark-8f90c151878571e20625e2a53561441ec0035dfc.tar.gz
spark-8f90c151878571e20625e2a53561441ec0035dfc.tar.bz2
spark-8f90c151878571e20625e2a53561441ec0035dfc.zip
[SPARK-12616][SQL] Making Logical Operator `Union` Support Arbitrary Number of Children
The existing `Union` logical operator only supports two children. Thus, adding a new logical operator `Unions` which can have arbitrary number of children to replace the existing one. `Union` logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer to collapse all adjacent `Unions` into a single `Unions`. Note that this problem doesn't exist in physical plan, because the physical `Unions` already supports arbitrary number of children. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10577 from gatorsmile/unionAllMultiChildren.
Diffstat (limited to 'sql/core/src/test/java')
-rw-r--r--sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java4
1 files changed, 2 insertions, 2 deletions
diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 1a3df1b117..3c0f25a5dc 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -298,9 +298,9 @@ public class JavaDatasetSuite implements Serializable {
Dataset<String> intersected = ds.intersect(ds2);
Assert.assertEquals(Arrays.asList("xyz"), intersected.collectAsList());
- Dataset<String> unioned = ds.union(ds2);
+ Dataset<String> unioned = ds.union(ds2).union(ds);
Assert.assertEquals(
- Arrays.asList("abc", "abc", "xyz", "xyz", "foo", "foo"),
+ Arrays.asList("abc", "abc", "xyz", "xyz", "foo", "foo", "abc", "abc", "xyz"),
unioned.collectAsList());
Dataset<String> subtracted = ds.subtract(ds2);