[SPARK-12616][SQL] Making Logical Operator `Union` Support Arbitrary Number of Children

The existing `Union` logical operator only supports two children. Thus, adding a new logical operator `Unions` which can have arbitrary number of children to replace the existing one. `Union` logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer to collapse all adjacent `Unions` into a single `Unions`. Note that this problem doesn't exist in physical plan, because the physical `Unions` already supports arbitrary number of children. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10577 from gatorsmile/unionAllMultiChildren.
author: gatorsmile <gatorsmile@gmail.com> 2016-01-20 14:59:30 -0800
committer: Reynold Xin <rxin@databricks.com> 2016-01-20 14:59:30 -0800
commit: 8f90c151878571e20625e2a53561441ec0035dfc (patch)
tree: b9b4354468e5e2f220c14ac520a960c94e0274b5 /sql/core/src/test/java
parent: b7d74a602f622d8e105b349bd6d17ba42e7668dc (diff)
download: spark-8f90c151878571e20625e2a53561441ec0035dfc.tar.gz
spark-8f90c151878571e20625e2a53561441ec0035dfc.tar.bz2
spark-8f90c151878571e20625e2a53561441ec0035dfc.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
index 1a3df1b117..3c0f25a5dc 100644
--- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
+++ b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
@@ -298,9 +298,9 @@ public class JavaDatasetSuite implements Serializable {
     Dataset<String> intersected = ds.intersect(ds2);
     Assert.assertEquals(Arrays.asList("xyz"), intersected.collectAsList());
 
-    Dataset<String> unioned = ds.union(ds2);
+    Dataset<String> unioned = ds.union(ds2).union(ds);
     Assert.assertEquals(
-      Arrays.asList("abc", "abc", "xyz", "xyz", "foo", "foo"),
+      Arrays.asList("abc", "abc", "xyz", "xyz", "foo", "foo", "abc", "abc", "xyz"),
       unioned.collectAsList());
 
     Dataset<String> subtracted = ds.subtract(ds2);
author	gatorsmile <gatorsmile@gmail.com>	2016-01-20 14:59:30 -0800
committer	Reynold Xin <rxin@databricks.com>	2016-01-20 14:59:30 -0800
commit	8f90c151878571e20625e2a53561441ec0035dfc (patch)
tree	b9b4354468e5e2f220c14ac520a960c94e0274b5 /sql/core/src/test/java
parent	b7d74a602f622d8e105b349bd6d17ba42e7668dc (diff)
download	spark-8f90c151878571e20625e2a53561441ec0035dfc.tar.gz spark-8f90c151878571e20625e2a53561441ec0035dfc.tar.bz2 spark-8f90c151878571e20625e2a53561441ec0035dfc.zip