[SPARK-14429][SQL] Improve LIKE pattern in "SHOW TABLES / FUNCTIONS LIKE <pattern>" DDL

LIKE <pattern> is commonly used in SHOW TABLES / FUNCTIONS etc DDL. In the pattern, user can use `|` or `*` as wildcards. 1. Currently, we used `replaceAll()` to replace `*` with `.*`, but the replacement was scattered in several places; I have created an utility method and use it in all the places; 2. Consistency with Hive: the pattern is case insensitive in Hive and white spaces will be trimmed, but current pattern matching does not do that. For example, suppose we have tables (t1, t2, t3), `SHOW TABLES LIKE ' T* ' ` will list all the t-tables. Please use Hive to verify it. 3. Combined with `|`, the result will be sorted. For pattern like `' B*|a* '`, it will list the result in a-b order. I've made some changes to the utility method to make sure we will get the same result as Hive does. A new method was created in StringUtil and test cases were added. andrewor14 Author: bomeng <bmeng@us.ibm.com> Closes #12206 from bomeng/SPARK-14429.
author: bomeng <bmeng@us.ibm.com> 2016-04-06 11:05:52 -0700
committer: Andrew Or <andrew@databricks.com> 2016-04-06 11:06:14 -0700
commit: 5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80 (patch)
tree: 88ffa4ba214811f3242254bd280bed48bf463ede /sql/core
parent: 10494feae0c2c1aca545c73ba61af6d8f743c5bb (diff)
download: spark-5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80.tar.gz
spark-5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80.tar.bz2
spark-5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80.zip
1 files changed, 5 insertions, 7 deletions
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 5a851b47ca..2ab7c1581c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -24,6 +24,7 @@ import org.apache.spark.AccumulatorSuite
 import org.apache.spark.sql.catalyst.analysis.UnresolvedException
 import org.apache.spark.sql.catalyst.expressions.SortOrder
 import org.apache.spark.sql.catalyst.plans.logical.Aggregate
+import org.apache.spark.sql.catalyst.util.StringUtils
 import org.apache.spark.sql.execution.aggregate
 import org.apache.spark.sql.execution.joins.{BroadcastHashJoin, CartesianProduct, SortMergeJoin}
 import org.apache.spark.sql.functions._
@@ -56,17 +57,14 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
 
   test("show functions") {
     def getFunctions(pattern: String): Seq[Row] = {
-      val regex = java.util.regex.Pattern.compile(pattern)
-      sqlContext.sessionState.functionRegistry.listFunction()
-        .filter(regex.matcher(_).matches()).map(Row(_))
+      StringUtils.filterPattern(sqlContext.sessionState.functionRegistry.listFunction(), pattern)
+        .map(Row(_))
     }
-    checkAnswer(sql("SHOW functions"), getFunctions(".*"))
+    checkAnswer(sql("SHOW functions"), getFunctions("*"))
     Seq("^c*", "*e$", "log*", "*date*").foreach { pattern =>
       // For the pattern part, only '*' and '|' are allowed as wildcards.
       // For '*', we need to replace it to '.*'.
-      checkAnswer(
-        sql(s"SHOW FUNCTIONS '$pattern'"),
-        getFunctions(pattern.replaceAll("\\*", ".*")))
+      checkAnswer(sql(s"SHOW FUNCTIONS '$pattern'"), getFunctions(pattern))
     }
   }
author	bomeng <bmeng@us.ibm.com>	2016-04-06 11:05:52 -0700
committer	Andrew Or <andrew@databricks.com>	2016-04-06 11:06:14 -0700
commit	5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80 (patch)
tree	88ffa4ba214811f3242254bd280bed48bf463ede /sql/core
parent	10494feae0c2c1aca545c73ba61af6d8f743c5bb (diff)
download	spark-5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80.tar.gz spark-5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80.tar.bz2 spark-5abd02c02b3fa3505defdc8ab0c5c5e23a16aa80.zip