aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDongjoon Hyun <dongjoon@apache.org>2016-03-07 12:06:46 -0800
committerJosh Rosen <joshrosen@databricks.com>2016-03-07 12:06:46 -0800
commite72914f37de85519fc2aa131bac69d7582de98c8 (patch)
tree2368b61ef20841707bf22047ef094b942e729b6c
parentef77003178eb5cdcb4fe519fc540917656c5d577 (diff)
downloadspark-e72914f37de85519fc2aa131bac69d7582de98c8.tar.gz
spark-e72914f37de85519fc2aa131bac69d7582de98c8.tar.bz2
spark-e72914f37de85519fc2aa131bac69d7582de98c8.zip
[SPARK-12243][BUILD][PYTHON] PySpark tests are slow in Jenkins.
## What changes were proposed in this pull request? In the Jenkins pull request builder, PySpark tests take around [962 seconds ](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52530/console) of end-to-end time to run, despite the fact that we run four Python test suites in parallel. According to the log, the basic reason is that the long running test starts at the end due to FIFO queue. We first try to reduce the test time by just starting some long running tests first with simple priority queue. ``` ======================================================================== Running PySpark tests ======================================================================== ... Finished test(python3.4): pyspark.streaming.tests (213s) Finished test(pypy): pyspark.sql.tests (92s) Finished test(pypy): pyspark.streaming.tests (280s) Tests passed in 962 seconds ``` ## How was this patch tested? Manual check. Check 'Running PySpark tests' part of the Jenkins log. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11551 from dongjoon-hyun/SPARK-12243.
-rwxr-xr-xpython/run-tests.py11
1 files changed, 8 insertions, 3 deletions
diff --git a/python/run-tests.py b/python/run-tests.py
index ee73eb1506..a9f8854e6f 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -157,7 +157,7 @@ def main():
LOGGER.info("Will test against the following Python executables: %s", python_execs)
LOGGER.info("Will test the following Python modules: %s", [x.name for x in modules_to_test])
- task_queue = Queue.Queue()
+ task_queue = Queue.PriorityQueue()
for python_exec in python_execs:
python_implementation = subprocess_check_output(
[python_exec, "-c", "import platform; print(platform.python_implementation())"],
@@ -168,12 +168,17 @@ def main():
for module in modules_to_test:
if python_implementation not in module.blacklisted_python_implementations:
for test_goal in module.python_test_goals:
- task_queue.put((python_exec, test_goal))
+ if test_goal in ('pyspark.streaming.tests', 'pyspark.mllib.tests',
+ 'pyspark.tests', 'pyspark.sql.tests'):
+ priority = 0
+ else:
+ priority = 100
+ task_queue.put((priority, (python_exec, test_goal)))
def process_queue(task_queue):
while True:
try:
- (python_exec, test_goal) = task_queue.get_nowait()
+ (priority, (python_exec, test_goal)) = task_queue.get_nowait()
except Queue.Empty:
break
try: