diff options
author | Cheng Lian <lian@databricks.com> | 2015-05-26 20:48:56 -0700 |
---|---|---|
committer | Yin Huai <yhuai@databricks.com> | 2015-05-26 20:48:56 -0700 |
commit | b463e6d618e69c535297e51f41eca4f91bd33cc8 (patch) | |
tree | 211ed6394d8ee89e3944d2bb9076f07c5335f802 /sql/hive | |
parent | 0c33c7b4a66e47f6246f1b7f2b96f2c33126ec63 (diff) | |
download | spark-b463e6d618e69c535297e51f41eca4f91bd33cc8.tar.gz spark-b463e6d618e69c535297e51f41eca4f91bd33cc8.tar.bz2 spark-b463e6d618e69c535297e51f41eca4f91bd33cc8.zip |
[SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation
So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan.
Author: Cheng Lian <lian@databricks.com>
Closes #6411 from liancheng/spark-7868 and squashes the following commits:
273ea36 [Cheng Lian] Ignores _temporary directories
Diffstat (limited to 'sql/hive')
-rw-r--r-- | sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala | 16 |
1 files changed, 16 insertions, 0 deletions
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala b/sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala index 7c02d563f8..cf5ae88dc4 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala @@ -548,4 +548,20 @@ class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest { checkAnswer(table("t"), df.select('b, 'c, 'a).collect()) } } + + test("SPARK-7868: _temporary directories should be ignored") { + withTempPath { dir => + val df = Seq("a", "b", "c").zipWithIndex.toDF() + + df.write + .format("parquet") + .save(dir.getCanonicalPath) + + df.write + .format("parquet") + .save(s"${dir.getCanonicalPath}/_temporary") + + checkAnswer(read.format("parquet").load(dir.getCanonicalPath), df.collect()) + } + } } |