[SPARK-17986][ML] SQLTransformer should remove temporary tables

## What changes were proposed in this pull request? A call to the method `SQLTransformer.transform` previously would create a temporary table and never delete it. This change adds a call to `dropTempView()` that deletes this temporary table before returning the result so that the table will not remain in spark's table catalog. Because `tableName` is randomized and not exposed, there should be no expected use of this table outside of the `transform` method. ## How was this patch tested? A single new assertion was added to the existing test of the `SQLTransformer.transform` method that all temporary tables are removed. Without the corresponding code change, this new assertion fails. I am not aware of any circumstances in which removing this temporary view would be bad for performance or correctness in other ways, but some expertise here would be helpful. Author: Drew Robb <drewrobb@gmail.com> Closes #15526 from drewrobb/SPARK-17986.
author: Drew Robb <drewrobb@gmail.com> 2016-10-22 01:59:36 -0700
committer: Yanbo Liang <ybliang8@gmail.com> 2016-10-22 01:59:36 -0700
commit: ab3363e9f6b1f7fc26682509fe7382c570f91778 (patch)
tree: 051346a15d581c1fab21ef3e0e380ebbcb91b693 /mllib
parent: 01b26a06436b4c8020f22be3e1da4995b44c9b03 (diff)
download: spark-ab3363e9f6b1f7fc26682509fe7382c570f91778.tar.gz
spark-ab3363e9f6b1f7fc26682509fe7382c570f91778.tar.bz2
spark-ab3363e9f6b1f7fc26682509fe7382c570f91778.zip
2 files changed, 4 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala
index 259be2679c..b25fff973c 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala
@@ -67,7 +67,9 @@ class SQLTransformer @Since("1.6.0") (@Since("1.6.0") override val uid: String)
     val tableName = Identifiable.randomUID(uid)
     dataset.createOrReplaceTempView(tableName)
     val realStatement = $(statement).replace(tableIdentifier, tableName)
-    dataset.sparkSession.sql(realStatement)
+    val result = dataset.sparkSession.sql(realStatement)
+    dataset.sparkSession.catalog.dropTempView(tableName)
+    result
   }
 
   @Since("1.6.0")
diff --git a/mllib/src/test/scala/org/apache/spark/ml/feature/SQLTransformerSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/feature/SQLTransformerSuite.scala
index 23464073e6..753f890c48 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/feature/SQLTransformerSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/feature/SQLTransformerSuite.scala
@@ -43,6 +43,7 @@ class SQLTransformerSuite
     assert(result.schema.toString == resultSchema.toString)
     assert(resultSchema == expected.schema)
     assert(result.collect().toSeq == expected.collect().toSeq)
+    assert(original.sparkSession.catalog.listTables().count() == 0)
   }
 
   test("read/write") {
author	Drew Robb <drewrobb@gmail.com>	2016-10-22 01:59:36 -0700
committer	Yanbo Liang <ybliang8@gmail.com>	2016-10-22 01:59:36 -0700
commit	ab3363e9f6b1f7fc26682509fe7382c570f91778 (patch)
tree	051346a15d581c1fab21ef3e0e380ebbcb91b693 /mllib
parent	01b26a06436b4c8020f22be3e1da4995b44c9b03 (diff)
download	spark-ab3363e9f6b1f7fc26682509fe7382c570f91778.tar.gz spark-ab3363e9f6b1f7fc26682509fe7382c570f91778.tar.bz2 spark-ab3363e9f6b1f7fc26682509fe7382c570f91778.zip