aboutsummaryrefslogtreecommitdiff
path: root/common/network-yarn/src/main
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-17433] YarnShuffleService doesn't handle moving credentials levelDbThomas Graves2016-09-091-17/+39
| | | | | | | | | | | The secrets leveldb isn't being moved if you run spark shuffle services without yarn nm recovery on and then turn it on. This fixes that. I unfortunately missed this when I ported the patch from our internal branch 2 to master branch due to the changes for the recovery path. Note this only applies to master since it is the only place the yarn nm recovery dir is used. Unit tests ran and tested on 8 node cluster. Fresh startup with NM recovery, fresh startup no nm recovery, switching between no nm recovery and recovery. Also tested running applications to make sure wasn't affected by rolling upgrade. Author: Thomas Graves <tgraves@prevailsail.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@apache.org> Closes #14999 from tgravescs/SPARK-17433.
* [SPARK-16711] YarnShuffleService doesn't re-init properly on YARN rolling ↵Thomas Graves2016-09-021-8/+127
| | | | | | | | | | | | | | upgrade The Spark Yarn Shuffle Service doesn't re-initialize the application credentials early enough which causes any other spark executors trying to fetch from that node during a rolling upgrade to fail with "java.lang.NullPointerException: Password cannot be null if SASL is enabled". Right now the spark shuffle service relies on the Yarn nodemanager to re-register the applications, unfortunately this is after we open the port for other executors to connect. If other executors connected before the re-register they get a null pointer exception which isn't a re-tryable exception and cause them to fail pretty quickly. To solve this I added another leveldb file so that it can save and re-initialize all the applications before opening the port for other executors to connect to it. Adding another leveldb was simpler from the code structure point of view. Most of the code changes are moving things to common util class. Patch was tested manually on a Yarn cluster with rolling upgrade was happing while spark job was running. Without the patch I consistently get the NullPointerException, with the patch the job gets a few Connection refused exceptions but the retries kick in and the it succeeds. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #14718 from tgravescs/SPARK-16711.
* [SPARK-17332][CORE] Make Java Loggers static membersSean Owen2016-08-311-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Make all Java Loggers static members ## How was this patch tested? Jenkins Author: Sean Owen <sowen@cloudera.com> Closes #14896 from srowen/SPARK-17332.
* [MINOR][BUILD] Fix Java Linter `LineLength` errorsDongjoon Hyun2016-07-191-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes four java linter `LineLength` errors. Those are all `LineLength` errors, but we had better remove all java linter errors before release. ## How was this patch tested? After pass the Jenkins, `./dev/lint-java`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14255 from dongjoon-hyun/minor_java_linter.
* [SPARK-16505][YARN] Optionally propagate error during shuffle service startup.Marcelo Vanzin2016-07-141-32/+43
| | | | | | | | | | | This prevents the NM from starting when something is wrong, which would lead to later errors which are confusing and harder to debug. Added a unit test to verify startup fails if something is wrong. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #14162 from vanzin/SPARK-16505.
* [SPARK-14963][MINOR][YARN] Fix typo in YarnShuffleService recovery file namejerryshao2016-07-141-1/+1
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Due to the changes of [SPARK-14963](https://issues.apache.org/jira/browse/SPARK-14963), external shuffle recovery file name is changed mistakenly, so here change it back to the previous file name. This only affects the master branch, branch-2.0 is correct [here](https://github.com/apache/spark/blob/branch-2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java#L195). ## How was this patch tested? N/A Author: jerryshao <sshao@hortonworks.com> Closes #14197 from jerryshao/fix-typo-file-name.
* [SPARK-14963][YARN] Using recoveryPath if NM recovery is enabledjerryshao2016-05-101-11/+53
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? From Hadoop 2.5+, Yarn NM supports NM recovery which using recovery path for auxiliary services such as spark_shuffle, mapreduce_shuffle. So here change to use this path install of NM local dir if NM recovery is enabled. ## How was this patch tested? Unit test + local test. Author: jerryshao <sshao@hortonworks.com> Closes #12994 from jerryshao/SPARK-14963.
* [SPARK-13622][YARN] Issue creating level db for YARN shuffle servicenfraison2016-03-281-3/+4
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch will ensure that we trim all path set in yarn.nodemanager.local-dirs and that the the scheme is well removed so the level db can be created. ## How was this patch tested? manual tests. Author: nfraison <nfraison@yahoo.fr> Closes #11475 from ashangit/level_db_creation_issue.
* [SPARK-13529][BUILD] Move network/* modules into common/network-*Reynold Xin2016-02-282-0/+266
## What changes were proposed in this pull request? As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder. ## How was this patch tested? Compilation and existing tests. We should run both SBT and Maven. Author: Reynold Xin <rxin@databricks.com> Closes #11409 from rxin/SPARK-13529.