[SPARK-18842][TESTS] De-duplicate paths in classpaths in processes for local-cluster mode in ReplSuite to work around the length limitation on Windows - spark

diff options

author	hyukjinkwon <gurwls223@gmail.com>	2016-12-27 18:50:54 +0000
committer	Sean Owen <sowen@cloudera.com>	2016-12-27 18:50:54 +0000
commit	d8e14db84f5ea752fbe92036209f67232b4dcc1f (patch)
tree	cc45d38bdbc590a7fefa8039cc95772614bb4356 /docs
parent	2404d8e54b6b2cfc78d892e7ebb31578457518a3 (diff)
download	spark-d8e14db84f5ea752fbe92036209f67232b4dcc1f.tar.gz spark-d8e14db84f5ea752fbe92036209f67232b4dcc1f.tar.bz2 spark-d8e14db84f5ea752fbe92036209f67232b4dcc1f.zip

[SPARK-18842][TESTS] De-duplicate paths in classpaths in processes for local-cluster mode in ReplSuite to work around the length limitation on Windows

## What changes were proposed in this pull request? `ReplSuite`s hang due to the length limitation on Windows with the exception as below: ``` Spark context available as 'sc' (master = local-cluster[1,1,1024], app id = app-20161223114000-0000). Spark session available as 'spark'. Exception in thread "ExecutorRunner for app-20161223114000-0000/26995" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622) at java.lang.StringBuilder.append(StringBuilder.java:202) at java.lang.ProcessImpl.createCommandLine(ProcessImpl.java:194) at java.lang.ProcessImpl.<init>(ProcessImpl.java:340) at java.lang.ProcessImpl.start(ProcessImpl.java:137) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:167) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) ``` The reason is, it keeps failing and goes in an infinite loop. This fails because it uses the paths (via `getFile`) from URLs in the tests whereas some added afterward are normal local paths. (`url.getFile` gives `/C:/a/b/c` and some paths are added later as the format of `C:\a\b\c`. ) So, many classpaths are duplicated because normal local paths and paths from URLs are mixed. This length is up to 40K which hits the length limitation problem (32K) on Windows. The full command line built here is - https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb ## How was this patch tested? Manually via AppVeyor. **Before** https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues **After** https://ci.appveyor.com/project/spark-test/spark/build/398-find-path-issues Author: hyukjinkwon <gurwls223@gmail.com> Closes #16398 from HyukjinKwon/SPARK-18842-more.

Diffstat (limited to 'docs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: