[SPARK-9486][SQL] Add data source aliasing for external packages

Users currently have to provide the full class name for external data sources, like: `sqlContext.read.format("com.databricks.spark.avro").load(path)` This allows external data source packages to register themselves using a Service Loader so that they can add custom alias like: `sqlContext.read.format("avro").load(path)` This makes it so that using external data source packages uses the same format as the internal data sources like parquet, json, etc. Author: Joseph Batchik <joseph.batchik@cloudera.com> Author: Joseph Batchik <josephbatchik@gmail.com> Closes #7802 from JDrit/service_loader and squashes the following commits: 49a01ec [Joseph Batchik] fixed a couple of format / error bugs e5e93b2 [Joseph Batchik] modified rat file to only excluded added services 72b349a [Joseph Batchik] fixed error with orc data source actually 9f93ea7 [Joseph Batchik] fixed error with orc data source 87b7f1c [Joseph Batchik] fixed typo 101cd22 [Joseph Batchik] removing unneeded changes 8f3cf43 [Joseph Batchik] merged in changes b63d337 [Joseph Batchik] merged in master 95ae030 [Joseph Batchik] changed the new trait to be used as a mixin for data source to register themselves 74db85e [Joseph Batchik] reformatted class loader ac2270d [Joseph Batchik] removing some added test a6926db [Joseph Batchik] added test cases for data source loader 208a2a8 [Joseph Batchik] changes to do error catching if there are multiple data sources 946186e [Joseph Batchik] started working on service loader
author: Joseph Batchik <joseph.batchik@cloudera.com> 2015-08-08 11:03:01 -0700
committer: Reynold Xin <rxin@databricks.com> 2015-08-08 11:03:01 -0700
commit: a3aec918bed22f8e33cf91dc0d6e712e6653c7d2 (patch)
tree: 6c8bf644c083f7e7f0ede49873debb45d805cb5d /sql/hive/src/main
parent: 23695f1d2d7ef9f3ea92cebcd96b1cf0e8904eb4 (diff)
download: spark-a3aec918bed22f8e33cf91dc0d6e712e6653c7d2.tar.gz
spark-a3aec918bed22f8e33cf91dc0d6e712e6653c7d2.tar.bz2
spark-a3aec918bed22f8e33cf91dc0d6e712e6653c7d2.zip
2 files changed, 5 insertions, 1 deletions
diff --git a/sql/hive/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister b/sql/hive/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
new file mode 100644
index 0000000000..4a774fbf1f
--- /dev/null
+++ b/sql/hive/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
@@ -0,0 +1 @@
+org.apache.spark.sql.hive.orc.DefaultSource
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
index 7c8704b47f..0c344c63fd 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
@@ -47,7 +47,10 @@ import org.apache.spark.util.SerializableConfiguration
 /* Implicit conversions */
 import scala.collection.JavaConversions._
 
-private[sql] class DefaultSource extends HadoopFsRelationProvider {
+private[sql] class DefaultSource extends HadoopFsRelationProvider with DataSourceRegister {
+
+  def format(): String = "orc"
+
   def createRelation(
       sqlContext: SQLContext,
       paths: Array[String],
author	Joseph Batchik <joseph.batchik@cloudera.com>	2015-08-08 11:03:01 -0700
committer	Reynold Xin <rxin@databricks.com>	2015-08-08 11:03:01 -0700
commit	a3aec918bed22f8e33cf91dc0d6e712e6653c7d2 (patch)
tree	6c8bf644c083f7e7f0ede49873debb45d805cb5d /sql/hive/src/main
parent	23695f1d2d7ef9f3ea92cebcd96b1cf0e8904eb4 (diff)
download	spark-a3aec918bed22f8e33cf91dc0d6e712e6653c7d2.tar.gz spark-a3aec918bed22f8e33cf91dc0d6e712e6653c7d2.tar.bz2 spark-a3aec918bed22f8e33cf91dc0d6e712e6653c7d2.zip