From 3537782168aa9278ac4add1a25afac0ec6e17085 Mon Sep 17 00:00:00 2001 From: jerryshao Date: Fri, 18 Mar 2016 12:39:49 -0700 Subject: [SPARK-13885][YARN] Fix attempt id regression for Spark running on Yarn ## What changes were proposed in this pull request? This regression is introduced in #9182, previously attempt id is simply as counter "1" or "2". With the change of #9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code. Also revert back this patch #11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it. Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now. ## How was this patch tested? Test it with unit test and manually test different scenario: 1. application running in yarn-client mode. 2. application running in yarn-cluster mode. 3. application running in yarn-cluster mode with multiple attempts. Checked both the event log file name and url link. CC vanzin tgravescs , please help to review, thanks a lot. Author: jerryshao Closes #11721 from jerryshao/SPARK-13885. --- .../scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala | 7 ++----- .../org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala | 3 ++- 2 files changed, 4 insertions(+), 6 deletions(-) (limited to 'yarn/src') diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala b/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala index 84445d60cd..e941089d1b 100644 --- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala +++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala @@ -137,12 +137,9 @@ private[spark] class ApplicationMaster( System.setProperty("spark.master", "yarn") System.setProperty("spark.submit.deployMode", "cluster") - // Propagate the application ID so that YarnClusterSchedulerBackend can pick it up. + // Set this internal configuration if it is running on cluster mode, this + // configuration will be checked in SparkContext to avoid misuse of yarn cluster mode. System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString()) - - // Propagate the attempt if, so that in case of event logging, - // different attempt's logs gets created in different directory - System.setProperty("spark.yarn.app.attemptId", appAttemptId.getAttemptId().toString()) } logInfo("ApplicationAttemptId: " + appAttemptId) diff --git a/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala b/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala index 0cc158b15a..a8781636f2 100644 --- a/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala +++ b/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala @@ -96,11 +96,12 @@ private[spark] abstract class YarnSchedulerBackend( /** * Get the attempt ID for this run, if the cluster manager supports multiple * attempts. Applications run in client mode will not have attempt IDs. + * This attempt ID only includes attempt counter, like "1", "2". * * @return The application attempt id, if available. */ override def applicationAttemptId(): Option[String] = { - attemptId.map(_.toString) + attemptId.map(_.getAttemptId.toString) } /** -- cgit v1.2.3