[SPARK-12441][SQL] Fixing missingInput in Generate/MapPartitions/AppendColumns/MapGroups/CoGroup

When explain any plan with Generate, we will see an exclamation mark in the plan. Normally, when we see this mark, it means the plan has an error. This PR is to correct the `missingInput` in `Generate`. For example, ```scala val df = Seq((1, "a b c"), (2, "a b"), (3, "a")).toDF("number", "letters") val df2 = df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } df2.explain(true) ``` Before the fix, the plan is like ``` == Parsed Logical Plan == 'Generate UserDefinedGenerator('letters), true, false, None +- Project [_1#0 AS number#2,_2#1 AS letters#3] +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]] == Analyzed Logical Plan == number: int, letters: string, _1: string Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8] +- Project [_1#0 AS number#2,_2#1 AS letters#3] +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]] == Optimized Logical Plan == Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8] +- LocalRelation [number#2,letters#3], [[1,a b c],[2,a b],[3,a]] == Physical Plan == !Generate UserDefinedGenerator(letters#3), true, false, [number#2,letters#3,_1#8] +- LocalTableScan [number#2,letters#3], [[1,a b c],[2,a b],[3,a]] ``` **Updates**: The same issues are also found in the other four Dataset operators: `MapPartitions`/`AppendColumns`/`MapGroups`/`CoGroup`. Fixed all these four. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #10393 from gatorsmile/generateExplain.
author: gatorsmile <gatorsmile@gmail.com> 2015-12-28 12:48:30 -0800
committer: Michael Armbrust <michael@databricks.com> 2015-12-28 12:48:30 -0800
commit: 01ba95d8bfc16a2542c67b066b0a1d1e465f91da (patch)
tree: 416aa86355d55123ff54c42a8869f3f9bca410ea /sql/hive
parent: a6a4812434c6f43cd4742437f957fecd86220255 (diff)
download: spark-01ba95d8bfc16a2542c67b066b0a1d1e465f91da.tar.gz
spark-01ba95d8bfc16a2542c67b066b0a1d1e465f91da.tar.bz2
spark-01ba95d8bfc16a2542c67b066b0a1d1e465f91da.zip
2 files changed, 5 insertions, 0 deletions
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala
index 806d2b9b0b..8141136de5 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala
@@ -51,6 +51,9 @@ case class HiveTableScan(
   require(partitionPruningPred.isEmpty || relation.hiveQlTable.isPartitioned,
     "Partition pruning predicates only supported for partitioned tables.")
 
+  override def producedAttributes: AttributeSet = outputSet ++
+    AttributeSet(partitionPruningPred.flatMap(_.references))
+
   // Retrieve the original attributes based on expression ID so that capitalization matches.
   val attributes = requestedAttributes.map(relation.attributeMap)
 
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
index d9b9ba4bfd..a61e162f48 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
@@ -60,6 +60,8 @@ case class ScriptTransformation(
 
   override protected def otherCopyArgs: Seq[HiveContext] = sc :: Nil
 
+  override def producedAttributes: AttributeSet = outputSet -- inputSet
+
   private val serializedHiveConf = new SerializableConfiguration(sc.hiveconf)
 
   protected override def doExecute(): RDD[InternalRow] = {
author	gatorsmile <gatorsmile@gmail.com>	2015-12-28 12:48:30 -0800
committer	Michael Armbrust <michael@databricks.com>	2015-12-28 12:48:30 -0800
commit	01ba95d8bfc16a2542c67b066b0a1d1e465f91da (patch)
tree	416aa86355d55123ff54c42a8869f3f9bca410ea /sql/hive
parent	a6a4812434c6f43cd4742437f957fecd86220255 (diff)
download	spark-01ba95d8bfc16a2542c67b066b0a1d1e465f91da.tar.gz spark-01ba95d8bfc16a2542c67b066b0a1d1e465f91da.tar.bz2 spark-01ba95d8bfc16a2542c67b066b0a1d1e465f91da.zip