[SPARK-19117][TESTS] Skip the tests using script transformation on Windows

## What changes were proposed in this pull request? This PR proposes to skip the tests for script transformation failed on Windows due to fixed bash location. ``` SQLQuerySuite: - script *** FAILED *** (553 milliseconds) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 56.0 failed 1 times, most recent failure: Lost task 0.0 in stage 56.0 (TID 54, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - Star Expansion - script transform *** FAILED *** (2 seconds, 375 milliseconds) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 389.0 failed 1 times, most recent failure: Lost task 0.0 in stage 389.0 (TID 725, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - test script transform for stdout *** FAILED *** (2 seconds, 813 milliseconds) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 391.0 failed 1 times, most recent failure: Lost task 0.0 in stage 391.0 (TID 726, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - test script transform for stderr *** FAILED *** (2 seconds, 407 milliseconds) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 393.0 failed 1 times, most recent failure: Lost task 0.0 in stage 393.0 (TID 727, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - test script transform data type *** FAILED *** (171 milliseconds) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 395.0 failed 1 times, most recent failure: Lost task 0.0 in stage 395.0 (TID 728, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified ``` ``` HiveQuerySuite: - transform *** FAILED *** (359 milliseconds) Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 0 in stage 1347.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1347.0 (TID 2395, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - schema-less transform *** FAILED *** (344 milliseconds) Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 0 in stage 1348.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1348.0 (TID 2396, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - transform with custom field delimiter *** FAILED *** (296 milliseconds) Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 0 in stage 1349.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1349.0 (TID 2397, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - transform with custom field delimiter2 *** FAILED *** (297 milliseconds) Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 0 in stage 1350.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1350.0 (TID 2398, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - transform with custom field delimiter3 *** FAILED *** (312 milliseconds) Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 0 in stage 1351.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1351.0 (TID 2399, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - transform with SerDe2 *** FAILED *** (437 milliseconds) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1355.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1355.0 (TID 2403, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified ``` ``` LogicalPlanToSQLSuite: - script transformation - schemaless *** FAILED *** (78 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1968.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1968.0 (TID 3932, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation - alias list *** FAILED *** (94 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1969.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1969.0 (TID 3933, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation - alias list with type *** FAILED *** (93 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1970.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1970.0 (TID 3934, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation - row format delimited clause with only one format property *** FAILED *** (78 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1971.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1971.0 (TID 3935, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation - row format delimited clause with multiple format properties *** FAILED *** (94 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1972.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1972.0 (TID 3936, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation - row format serde clauses with SERDEPROPERTIES *** FAILED *** (78 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1973.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1973.0 (TID 3937, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation - row format serde clauses without SERDEPROPERTIES *** FAILED *** (78 milliseconds) ... Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1974.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1974.0 (TID 3938, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified ``` ``` ScriptTransformationSuite: - cat without SerDe *** FAILED *** (156 milliseconds) ... Caused by: java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - cat with LazySimpleSerDe *** FAILED *** (63 milliseconds) ... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2383.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2383.0 (TID 4819, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation should not swallow errors from upstream operators (no serde) *** FAILED *** (78 milliseconds) ... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2384.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2384.0 (TID 4820, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - script transformation should not swallow errors from upstream operators (with serde) *** FAILED *** (47 milliseconds) ... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2385.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2385.0 (TID 4821, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified - SPARK-14400 script transformation should fail for bad script command *** FAILED *** (47 milliseconds) "Job aborted due to stage failure: Task 0 in stage 2386.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2386.0 (TID 4822, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified ``` ## How was this patch tested? AppVeyor as below: ``` SQLQuerySuite: - script !!! CANCELED !!! (63 milliseconds) - Star Expansion - script transform !!! CANCELED !!! (0 milliseconds) - test script transform for stdout !!! CANCELED !!! (0 milliseconds) - test script transform for stderr !!! CANCELED !!! (0 milliseconds) - test script transform data type !!! CANCELED !!! (0 milliseconds) ``` ``` HiveQuerySuite: - transform !!! CANCELED !!! (31 milliseconds) - schema-less transform !!! CANCELED !!! (0 milliseconds) - transform with custom field delimiter !!! CANCELED !!! (0 milliseconds) - transform with custom field delimiter2 !!! CANCELED !!! (0 milliseconds) - transform with custom field delimiter3 !!! CANCELED !!! (0 milliseconds) - transform with SerDe2 !!! CANCELED !!! (0 milliseconds) ``` ``` LogicalPlanToSQLSuite: - script transformation - schemaless !!! CANCELED !!! (78 milliseconds) - script transformation - alias list !!! CANCELED !!! (0 milliseconds) - script transformation - alias list with type !!! CANCELED !!! (0 milliseconds) - script transformation - row format delimited clause with only one format property !!! CANCELED !!! (15 milliseconds) - script transformation - row format delimited clause with multiple format properties !!! CANCELED !!! (0 milliseconds) - script transformation - row format serde clauses with SERDEPROPERTIES !!! CANCELED !!! (0 milliseconds) - script transformation - row format serde clauses without SERDEPROPERTIES !!! CANCELED !!! (0 milliseconds) ``` ``` ScriptTransformationSuite: - cat without SerDe !!! CANCELED !!! (62 milliseconds) - cat with LazySimpleSerDe !!! CANCELED !!! (0 milliseconds) - script transformation should not swallow errors from upstream operators (no serde) !!! CANCELED !!! (0 milliseconds) - script transformation should not swallow errors from upstream operators (with serde) !!! CANCELED !!! (0 milliseconds) - SPARK-14400 script transformation should fail for bad script command !!! CANCELED !!! (0 milliseconds) ``` Jenkins tests Author: hyukjinkwon <gurwls223@gmail.com> Closes #16501 from HyukjinKwon/windows-bash.
author: hyukjinkwon <gurwls223@gmail.com> 2017-01-10 13:22:35 +0000
committer: Sean Owen <sowen@cloudera.com> 2017-01-10 13:22:35 +0000
commit: 2cfd41ac02193aaf121afcddcb6383f4d075ea1e (patch)
tree: d954e87718305ca2ac313ddbb397110228e9e83e /core
parent: 4e27578faa67c7a71a9b938aafbaf79bdbf36831 (diff)
download: spark-2cfd41ac02193aaf121afcddcb6383f4d075ea1e.tar.gz
spark-2cfd41ac02193aaf121afcddcb6383f4d075ea1e.tar.bz2
spark-2cfd41ac02193aaf121afcddcb6383f4d075ea1e.zip
2 files changed, 19 insertions, 17 deletions
diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala
index b5b201409a..fd0477541e 100644
--- a/core/src/main/scala/org/apache/spark/TestUtils.scala
+++ b/core/src/main/scala/org/apache/spark/TestUtils.scala
@@ -20,7 +20,6 @@ package org.apache.spark
 import java.io.{ByteArrayInputStream, File, FileInputStream, FileOutputStream}
 import java.net.{URI, URL}
 import java.nio.charset.StandardCharsets
-import java.nio.file.Paths
 import java.util.Arrays
 import java.util.concurrent.{CountDownLatch, TimeUnit}
 import java.util.jar.{JarEntry, JarOutputStream}
@@ -28,6 +27,8 @@ import java.util.jar.{JarEntry, JarOutputStream}
 import scala.collection.JavaConverters._
 import scala.collection.mutable
 import scala.collection.mutable.ArrayBuffer
+import scala.sys.process.{Process, ProcessLogger}
+import scala.util.Try
 
 import com.google.common.io.{ByteStreams, Files}
 import javax.tools.{JavaFileObject, SimpleJavaFileObject, ToolProvider}
@@ -185,6 +186,14 @@ private[spark] object TestUtils {
     assert(spillListener.numSpilledStages == 0, s"expected $identifier to not spill, but did")
   }
 
+  /**
+   * Test if a command is available.
+   */
+  def testCommandAvailable(command: String): Boolean = {
+    val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue())
+    attempt.isSuccess && attempt.get == 0
+  }
+
 }
 
 
diff --git a/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala b/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
index 287ae6ff6e..1a0eb250e7 100644
--- a/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
+++ b/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala
@@ -21,8 +21,6 @@ import java.io.File
 
 import scala.collection.Map
 import scala.io.Codec
-import scala.sys.process._
-import scala.util.Try
 
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.io.{LongWritable, Text}
@@ -39,7 +37,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("basic pipe") {
-    assume(testCommandAvailable("cat"))
+    assume(TestUtils.testCommandAvailable("cat"))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
 
     val piped = nums.pipe(Seq("cat"))
@@ -53,7 +51,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("basic pipe with tokenization") {
-    assume(testCommandAvailable("wc"))
+    assume(TestUtils.testCommandAvailable("wc"))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
 
     // verify that both RDD.pipe(command: String) and RDD.pipe(command: String, env) work good
@@ -66,7 +64,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("failure in iterating over pipe input") {
-    assume(testCommandAvailable("cat"))
+    assume(TestUtils.testCommandAvailable("cat"))
     val nums =
       sc.makeRDD(Array(1, 2, 3, 4), 2)
         .mapPartitionsWithIndex((index, iterator) => {
@@ -86,7 +84,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("advanced pipe") {
-    assume(testCommandAvailable("cat"))
+    assume(TestUtils.testCommandAvailable("cat"))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
     val bl = sc.broadcast(List("0"))
 
@@ -147,7 +145,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("pipe with env variable") {
-    assume(testCommandAvailable(envCommand))
+    assume(TestUtils.testCommandAvailable(envCommand))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
     val piped = nums.pipe(s"$envCommand MY_TEST_ENV", Map("MY_TEST_ENV" -> "LALALA"))
     val c = piped.collect()
@@ -159,7 +157,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("pipe with process which cannot be launched due to bad command") {
-    assume(!testCommandAvailable("some_nonexistent_command"))
+    assume(!TestUtils.testCommandAvailable("some_nonexistent_command"))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
     val command = Seq("some_nonexistent_command")
     val piped = nums.pipe(command)
@@ -170,7 +168,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("pipe with process which is launched but fails with non-zero exit status") {
-    assume(testCommandAvailable("cat"))
+    assume(TestUtils.testCommandAvailable("cat"))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
     val command = Seq("cat", "nonexistent_file")
     val piped = nums.pipe(command)
@@ -181,7 +179,7 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
   }
 
   test("basic pipe with separate working directory") {
-    assume(testCommandAvailable("cat"))
+    assume(TestUtils.testCommandAvailable("cat"))
     val nums = sc.makeRDD(Array(1, 2, 3, 4), 2)
     val piped = nums.pipe(Seq("cat"), separateWorkingDir = true)
     val c = piped.collect()
@@ -208,13 +206,8 @@ class PipedRDDSuite extends SparkFunSuite with SharedSparkContext {
     testExportInputFile("mapreduce_map_input_file")
   }
 
-  def testCommandAvailable(command: String): Boolean = {
-    val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue())
-    attempt.isSuccess && attempt.get == 0
-  }
-
   def testExportInputFile(varName: String) {
-    assume(testCommandAvailable(envCommand))
+    assume(TestUtils.testCommandAvailable(envCommand))
     val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], classOf[LongWritable],
       classOf[Text], 2) {
       override def getPartitions: Array[Partition] = Array(generateFakeHadoopPartition())
author	hyukjinkwon <gurwls223@gmail.com>	2017-01-10 13:22:35 +0000
committer	Sean Owen <sowen@cloudera.com>	2017-01-10 13:22:35 +0000
commit	2cfd41ac02193aaf121afcddcb6383f4d075ea1e (patch)
tree	d954e87718305ca2ac313ddbb397110228e9e83e /core
parent	4e27578faa67c7a71a9b938aafbaf79bdbf36831 (diff)
download	spark-2cfd41ac02193aaf121afcddcb6383f4d075ea1e.tar.gz spark-2cfd41ac02193aaf121afcddcb6383f4d075ea1e.tar.bz2 spark-2cfd41ac02193aaf121afcddcb6383f4d075ea1e.zip