aboutsummaryrefslogtreecommitdiff
path: root/yarn
diff options
context:
space:
mode:
authorjerryshao <sshao@hortonworks.com>2016-03-18 12:39:49 -0700
committerMarcelo Vanzin <vanzin@cloudera.com>2016-03-18 12:39:49 -0700
commit3537782168aa9278ac4add1a25afac0ec6e17085 (patch)
treecb5fcec81608a012b6027e2b3ad25313371f50fc /yarn
parent9c23c818ca0175c8f2a4a66eac261ec251d27c97 (diff)
downloadspark-3537782168aa9278ac4add1a25afac0ec6e17085.tar.gz
spark-3537782168aa9278ac4add1a25afac0ec6e17085.tar.bz2
spark-3537782168aa9278ac4add1a25afac0ec6e17085.zip
[SPARK-13885][YARN] Fix attempt id regression for Spark running on Yarn
## What changes were proposed in this pull request? This regression is introduced in #9182, previously attempt id is simply as counter "1" or "2". With the change of #9182, it is changed to full name as "appattemtp-xxx-00001", this will affect all the parts which uses this attempt id, like event log file name, history server app url link. So here change it back to the counter to keep consistent with previous code. Also revert back this patch #11518, this patch fix the url link of history log according to the new way of attempt id, since here we change back to the previous way, so this patch is not necessary, here to revert it. Also clean "spark.yarn.app.id" and "spark.yarn.app.attemptId", since it is useless now. ## How was this patch tested? Test it with unit test and manually test different scenario: 1. application running in yarn-client mode. 2. application running in yarn-cluster mode. 3. application running in yarn-cluster mode with multiple attempts. Checked both the event log file name and url link. CC vanzin tgravescs , please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #11721 from jerryshao/SPARK-13885.
Diffstat (limited to 'yarn')
-rw-r--r--yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala7
-rw-r--r--yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala3
2 files changed, 4 insertions, 6 deletions
diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala b/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index 84445d60cd..e941089d1b 100644
--- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -137,12 +137,9 @@ private[spark] class ApplicationMaster(
System.setProperty("spark.master", "yarn")
System.setProperty("spark.submit.deployMode", "cluster")
- // Propagate the application ID so that YarnClusterSchedulerBackend can pick it up.
+ // Set this internal configuration if it is running on cluster mode, this
+ // configuration will be checked in SparkContext to avoid misuse of yarn cluster mode.
System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())
-
- // Propagate the attempt if, so that in case of event logging,
- // different attempt's logs gets created in different directory
- System.setProperty("spark.yarn.app.attemptId", appAttemptId.getAttemptId().toString())
}
logInfo("ApplicationAttemptId: " + appAttemptId)
diff --git a/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala b/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
index 0cc158b15a..a8781636f2 100644
--- a/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
+++ b/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
@@ -96,11 +96,12 @@ private[spark] abstract class YarnSchedulerBackend(
/**
* Get the attempt ID for this run, if the cluster manager supports multiple
* attempts. Applications run in client mode will not have attempt IDs.
+ * This attempt ID only includes attempt counter, like "1", "2".
*
* @return The application attempt id, if available.
*/
override def applicationAttemptId(): Option[String] = {
- attemptId.map(_.toString)
+ attemptId.map(_.getAttemptId.toString)
}
/**