[SPARK-9397] DataFrame should provide an API to find source data files if applicable

Certain applications would benefit from being able to inspect DataFrames that are straightforwardly produced by data sources that stem from files, and find out their source data. For example, one might want to display to a user the size of the data underlying a table, or to copy or mutate it. This PR exposes an `inputFiles` method on DataFrame which attempts to discover the source data in a best-effort manner, by inspecting HadoopFsRelations and JSONRelations. Author: Aaron Davidson <aaron@databricks.com> Closes #7717 from aarondav/paths and squashes the following commits: ff67430 [Aaron Davidson] inputFiles 0acd3ad [Aaron Davidson] [SPARK-9397] DataFrame should provide an API to find source data files if applicable
author: Aaron Davidson <aaron@databricks.com> 2015-07-28 10:12:09 -0700
committer: Michael Armbrust <michael@databricks.com> 2015-07-28 10:12:09 -0700
commit: 35ef853b3f9d955949c464e4a0d445147e0e9a07 (patch)
tree: 24350c93d1ece87827827d246c51a59b21200245 /sql/hive
parent: 9bbe0171cb434edb160fad30ea2d4221f525c919 (diff)
download: spark-35ef853b3f9d955949c464e4a0d445147e0e9a07.tar.gz
spark-35ef853b3f9d955949c464e4a0d445147e0e9a07.tar.bz2
spark-35ef853b3f9d955949c464e4a0d445147e0e9a07.zip
1 files changed, 3 insertions, 3 deletions
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
index 3180c05445..a8c9b4fa71 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
@@ -274,9 +274,9 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive
     val metastoreSchema = StructType.fromAttributes(metastoreRelation.output)
     val mergeSchema = hive.convertMetastoreParquetWithSchemaMerging
 
-    // NOTE: Instead of passing Metastore schema directly to `ParquetRelation2`, we have to
+    // NOTE: Instead of passing Metastore schema directly to `ParquetRelation`, we have to
     // serialize the Metastore schema to JSON and pass it as a data source option because of the
-    // evil case insensitivity issue, which is reconciled within `ParquetRelation2`.
+    // evil case insensitivity issue, which is reconciled within `ParquetRelation`.
     val parquetOptions = Map(
       ParquetRelation.METASTORE_SCHEMA -> metastoreSchema.json,
       ParquetRelation.MERGE_SCHEMA -> mergeSchema.toString)
@@ -290,7 +290,7 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive
         partitionSpecInMetastore: Option[PartitionSpec]): Option[LogicalRelation] = {
       cachedDataSourceTables.getIfPresent(tableIdentifier) match {
         case null => None // Cache miss
-        case logical@LogicalRelation(parquetRelation: ParquetRelation) =>
+        case logical @ LogicalRelation(parquetRelation: ParquetRelation) =>
           // If we have the same paths, same schema, and same partition spec,
           // we will use the cached Parquet Relation.
           val useCached =
author	Aaron Davidson <aaron@databricks.com>	2015-07-28 10:12:09 -0700
committer	Michael Armbrust <michael@databricks.com>	2015-07-28 10:12:09 -0700
commit	35ef853b3f9d955949c464e4a0d445147e0e9a07 (patch)
tree	24350c93d1ece87827827d246c51a59b21200245 /sql/hive
parent	9bbe0171cb434edb160fad30ea2d4221f525c919 (diff)
download	spark-35ef853b3f9d955949c464e4a0d445147e0e9a07.tar.gz spark-35ef853b3f9d955949c464e4a0d445147e0e9a07.tar.bz2 spark-35ef853b3f9d955949c464e4a0d445147e0e9a07.zip