| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
params have both in error msg
When we exceed the max memory tell users to increase both params instead of just the one.
Author: Holden Karau <holden@us.ibm.com>
Closes #9758 from holdenk/SPARK-11771-maximum-memory-in-yarn-is-controlled-by-two-params-have-both-in-error-msg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if dynamic allocation is enabled, explicitly killing executor will not get response, so the executor metadata is wrong in driver side. Which will make dynamic allocation on Yarn fail to work.
The problem is `disableExecutor` returns false for pending killing executors when `onDisconnect` is detected, so no further implementation is done.
One solution is to bypass these explicitly killed executors to use `super.onDisconnect` to remove executor. This is simple.
Another solution is still querying the loss reason for these explicitly kill executors. Since executor may get killed and informed in the same AM-RM communication, so current way of adding pending loss reason request is not worked (container complete is already processed), here we should store this loss reason for later query.
Here this PR chooses solution 2.
Please help to review. vanzin I think this part is changed by you previously, would you please help to review? Thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #9684 from jerryshao/SPARK-11718.
|
|
|
|
|
|
|
|
|
|
| |
I tested the various options with both spark-submit and spark-class of specifying number of executors in both client and cluster mode where it applied.
--num-workers, --num-executors, spark.executor.instances, SPARK_EXECUTOR_INSTANCES, default nothing supplied
Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
Closes #9523 from tgravescs/SPARK-11555.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In YARN mode, when preemption is enabled, we may leave executors in a
zombie state while we wait to retrieve the reason for which the executor
exited. This is so that we don't account for failed tasks that were
running on a preempted executor.
The issue is that while we wait for this information, the scheduler
might decide to schedule tasks on the executor, which will never be
able to run them. Other side effects include the block manager still
considering the executor available to cache blocks, for example.
So, when we know that an executor went down but we don't know why,
stop everything related to the executor, except its running tasks.
Only when we know the reason for the exit (or give up waiting for
it) we do update the running tasks.
This is achieved by a new `disableExecutor()` method in the
`Schedulable` interface. For managers that do not behave like this
(i.e. every one but YARN), the existing `executorLost()` method
will behave the same way it did before.
On top of that change, a few minor changes that made debugging easier,
and fixed some other minor issues:
- The cluster-mode AM was printing a misleading log message every
time an executor disconnected from the driver (because the akka
actor system was shared between driver and AM).
- Avoid sending unnecessary requests for an executor's exit reason
when we already know it was explicitly disabled / killed. This
avoids both multiple requests, and unnecessary requests that would
just cause warning messages on the AM (in the explicit kill case).
- Tone down a log message about the executor being lost when it
exited normally (e.g. preemption)
- Wake up the AM monitor thread when requests for executor loss
reasons arrive too, so that we can more quickly remove executors
from this zombie state.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8887 from vanzin/SPARK-10622.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"Client mode" means the RPC env will not listen for incoming connections.
This allows certain processes in the Spark stack (such as Executors or
tha YARN client-mode AM) to act as pure clients when using the netty-based
RPC backend, reducing the number of sockets needed by the app and also the
number of open ports.
Client connections are also preferred when endpoints that actually have
a listening socket are involved; so, for example, if a Worker connects
to a Master and the Master needs to send a message to a Worker endpoint,
that client connection will be used, even though the Worker is also
listening for incoming connections.
With this change, the workaround for SPARK-10987 isn't necessary anymore, and
is removed. The AM connects to the driver in "client mode", and that connection
is used for all driver <-> AM communication, and so the AM is properly notified
when the connection goes down.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9210 from vanzin/SPARK-10997.
|
|
|
|
|
|
|
|
|
|
|
|
| |
pending container requests into consideraion
This is a follow-up PR to further improve the locality calculation by considering the pending container's request. Since the locality preferences of tasks may be shifted from time to time, current localities of pending container requests may not fully match the new preferences, this PR improve it by removing outdated, unmatched container requests and replace with new requests.
sryza please help to review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #8100 from jerryshao/SPARK-9817.
|
|
|
|
|
|
|
|
|
|
| |
Use standard JDK APIs for that (with a little help from Guava). Most of the
changes here are in test code, since there were no tests specific to that
part of the code.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9257 from vanzin/SPARK-11073.
|
|
|
|
|
|
|
|
|
|
| |
secure cluster
This is a fix for SPARK-11265; the introspection code to get Hive delegation tokens failing on Spark 1.5.1+, due to changes in the Hive codebase
Author: Steve Loughran <stevel@hortonworks.com>
Closes #9232 from steveloughran/stevel/patches/SPARK-11265-hive-tokens.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new
functionality so that if an executor dies for a reason that's not
caused by one of the tasks running on the executor (e.g., due to
pre-emption), Spark doesn't count the failure towards the maximum
number of failures for the task. That commit introduced some vague
naming that this commit attempts to fix; in particular:
(1) The variable "isNormalExit", which was used to refer to cases where
the executor died for a reason unrelated to the tasks running on the
machine, has been renamed (and reversed) to "exitCausedByApp". The problem
with the existing name is that it's not clear (at least to me!) what it
means for an exit to be "normal"; the new name is intended to make the
purpose of this variable more clear.
(2) The variable "shouldEventuallyFailJob" has been renamed to
"countTowardsTaskFailures". This variable is used to determine whether
a task's failure should be counted towards the maximum number of failures
allowed for a task before the associated Stage is aborted. The problem
with the existing name is that it can be confused with implying that
the task's failure should immediately cause the stage to fail because it
is somehow fatal (this is the case for a fetch failure, for example: if
a task fails because of a fetch failure, there's no point in retrying,
and the whole stage should be failed).
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #9164 from kayousterhout/SPARK-11178.
|
|
|
|
|
|
|
|
|
|
|
| |
Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.
If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties
Author: vundela <vsr@cloudera.com>
Author: Srinivasa Reddy Vundela <vsr@cloudera.com>
Closes #9118 from vundela/master.
|
|
|
|
|
|
|
|
|
| |
Upgrade to Py4j0.9
Author: Holden Karau <holden@pigscanfly.ca>
Author: Holden Karau <holden@us.ibm.com>
Closes #8615 from holdenk/SPARK-10447-upgrade-pyspark-to-py4j0.9.
|
|
|
|
|
|
|
|
|
|
| |
dynamically allocating in YARN
I also added some information to container-failure error msgs about what host they failed on, which would have helped me identify the problem that lead me to this JIRA and PR sooner.
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes #9147 from ryan-williams/dyn-exec-failures.
|
|
|
|
|
|
|
|
| |
…redNodeLocationData
Author: Jacek Laskowski <jacek.laskowski@deepsense.io>
Closes #8976 from jaceklaskowski/SPARK-10921.
|
|
|
|
|
|
|
|
| |
`hive.metastore.uris` was set to avoid bootting the database twice
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes #9026 from SaintBacchus/SPARK-11000.
|
|
|
|
|
|
|
|
|
|
| |
'spark-submit --jars hdfs://user/foo.jar'
when spark.yarn.user.classpath.first=true and using 'spark-submit --jars hdfs://user/foo.jar', it can not put foo.jar to system classpath. so we need to put yarn's linkNames of jars to the system classpath. vanzin tgravescs
Author: Lianhui Wang <lianhuiwang09@gmail.com>
Closes #9045 from lianhuiwang/spark-11026.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures.
Author: jerryshao <sshao@hortonworks.com>
Closes #8857 from jerryshao/SPARK-10739 and squashes the following commits:
36eabdc [jerryshao] change the doc
7f9b77d [jerryshao] Style change
1c9afd0 [jerryshao] Address the comments
caca695 [jerryshao] Add application attempt window for Spark on Yarn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The issue is that local paths on Windows, when provided with drive
letters or backslashes, are not valid URIs.
Instead of trying to figure out whether paths are URIs or not, use
Utils.resolveURI() which does that for us.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9049 from vanzin/SPARK-11023 and squashes the following commits:
77021f2 [Marcelo Vanzin] [SPARK-11023] [yarn] Avoid creating URIs from local paths directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
apps.
This change adds an API that encapsulates information about an app
launched using the library. It also creates a socket-based communication
layer for apps that are launched as child processes; the launching
application listens for connections from launched apps, and once
communication is established, the channel can be used to send updates
to the launching app, or to send commands to the child app.
The change also includes hooks for local, standalone/client and yarn
masters.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7052 from vanzin/SPARK-8673.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In YARN client mode, when the AM connects to the driver, it may be the case
that the driver never needs to send a message back to the AM (i.e., no
dynamic allocation or preemption). This triggers an issue in the netty rpc
backend where no disconnection event is sent to endpoints, and the AM never
exits after the driver shuts down.
The real fix is too complicated, so this is a quick hack to unblock YARN
client mode until we can work on the real fix. It forces the driver to
send a message to the AM when the AM registers, thus establishing that
connection and enabling the disconnection event when the driver goes
away.
Also, a minor side issue: when the executor is shutting down, it needs
to send an "ack" back to the driver when using the netty rpc backend; but
that "ack" wasn't being sent because the handler was shutting down the rpc
env before returning. So added a change to delay the shutdown a little bit,
allowing the ack to be sent back.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9021 from vanzin/SPARK-10987.
|
|
|
|
|
|
|
|
|
|
| |
The `self` method returns null when called from the constructor;
instead, registration should happen in the `onStart` method, at
which point the `self` reference has already been initialized.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #9005 from vanzin/SPARK-10964.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A recent change to fix the referenced bug caused this exception in
the `SparkContext.stop()` path:
org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode!
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:167)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:182)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:440)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1579)
at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1730)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1729)
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8996 from vanzin/SPARK-10812.
|
|
|
|
|
|
|
|
|
|
|
| |
This should go into 1.5.2 also.
The issue is we were no longer adding the __app__.jar to the system classpath.
Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
Author: Tom Graves <tgraves@yahoo-inc.com>
Closes #8959 from tgravescs/SPARK-10901.
|
|
|
|
|
|
|
|
|
|
| |
This makes YARN containers behave like all other processes launched by
Spark, which launch with a default perm gen size of 256m unless
overridden by the user (or not needed by the vm).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8970 from vanzin/SPARK-10916.
|
|
|
|
|
|
|
|
| |
This PR just reverted https://github.com/apache/spark/commit/02144d6745ec0a6d8877d969feb82139bd22437f to remerge #6457 and also included the commits in #8905.
Author: zsxwing <zsxwing@gmail.com>
Closes #8944 from zsxwing/SPARK-6028.
|
|
|
|
|
|
| |
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes #8939 from ryan-williams/errmsg.
|
|
|
|
|
|
|
|
|
|
|
|
| |
consolidate the codes
This bug is introduced in [SPARK-9092](https://issues.apache.org/jira/browse/SPARK-9092), `targetExecutorNumber` should use `minExecutors` if `initialExecutors` is not set. Using 0 instead will meet the problem as mentioned in [SPARK-10790](https://issues.apache.org/jira/browse/SPARK-10790).
Also consolidate and simplify some similar code snippets to keep the consistent semantics.
Author: jerryshao <sshao@hortonworks.com>
Closes #8910 from jerryshao/SPARK-10790.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While this is likely not a huge issue for real production systems, for test systems which may setup a Spark Context and tear it down and stand up a Spark Context with a different master (e.g. some local mode & some yarn mode) tests this cane be an issue. Discovered during work on spark-testing-base on Spark 1.4.1, but seems like the logic that triggers it is present in master (see SparkHadoopUtil object). A valid work around for users encountering this issue is to fork a different JVM, however this can be heavy weight.
```
[info] SampleMiniClusterTest:
[info] Exception encountered when attempting to run a suite with class name: com.holdenkarau.spark.testing.SampleMiniClusterTest *** ABORTED ***
[info] java.lang.ClassCastException: org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
[info] at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:163)
[info] at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:257)
[info] at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561)
[info] at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115)
[info] at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
[info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
[info] at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
[info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.setup(SharedMiniCluster.scala:186)
[info] at com.holdenkarau.spark.testing.SampleMiniClusterTest.setup(SampleMiniClusterTest.scala:26)
[info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.beforeAll(SharedMiniCluster.scala:103)
```
Author: Holden Karau <holden@pigscanfly.ca>
Closes #8911 from holdenk/SPARK-10812-spark-hadoop-util-support-switching-to-yarn.
|
|
|
|
| |
This reverts commit 084e4e126211d74a79e8dbd2d0e604dd3c650822.
|
|
|
|
|
|
|
|
| |
Design doc: https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing
Author: zsxwing <zsxwing@gmail.com>
Closes #6457 from zsxwing/new-rpc.
|
|
|
|
|
|
|
|
|
|
| |
`ApplicationMaster` no longer has the `--num-executors` flag, and had an undocumented `--properties-file` configuration option.
cc srowen
Author: Erick Tryzelaar <erick.tryzelaar@gmail.com>
Closes #8754 from erickt/master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The architecture is that, in YARN mode, if the driver detects that an executor has disconnected, it asks the ApplicationMaster why the executor died. If the ApplicationMaster is aware that the executor died because of preemption, all tasks associated with that executor are not marked as failed. The executor
is still removed from the driver's list of available executors, however.
There's a few open questions:
1. Should standalone mode have a similar "get executor loss reason" as well? I localized this change as much as possible to affect only YARN, but there could be a valid case to differentiate executor losses in standalone mode as well.
2. I make a pretty strong assumption in YarnAllocator that getExecutorLossReason(executorId) will only be called once per executor id; I do this so that I can remove the metadata from the in-memory map to avoid object accumulation. It's not clear if I'm being overly zealous to save space, however.
cc vanzin specifically for review because it collided with some earlier YARN scheduling work.
cc JoshRosen because it's similar to output commit coordination we did in the past
cc andrewor14 for our discussion on how to get executor exit codes and loss reasons
Author: mcheah <mcheah@palantir.com>
Closes #8007 from mccheah/feature/preemption-handling.
|
|
|
|
|
|
|
|
|
|
| |
n…
Throw a more readable exception. Please help review. Thanks
Author: Jeff Zhang <zjffdu@apache.org>
Closes #8649 from zjffdu/SPARK-10481.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From Jira:
Running spark-submit with yarn with number-executors equal to 0 when not using dynamic allocation should error out.
In spark 1.5.0 it continues and ends up hanging.
yarn.ClientArguments still has the check so something else must have changed.
spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --num-executors 0 ....
spark 1.4.1 errors with:
java.lang.IllegalArgumentException:
Number of executors was 0, but must be at least 1
(or 0 if dynamic executor allocation is enabled).
Author: Holden Karau <holden@pigscanfly.ca>
Closes #8580 from holdenk/SPARK-10332-spark-submit-to-yarn-executors-0-message.
|
|
|
|
|
|
|
|
|
|
|
|
| |
to JavaConverters
Replace `JavaConversions` implicits with `JavaConverters`
Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet.
Author: Sean Owen <sowen@cloudera.com>
Closes #8033 from srowen/SPARK-9613.
|
|
|
|
|
|
|
|
|
|
| |
This allows skipping the code that tries to talk to Hive and HBase to
fetch delegation tokens, in case that somehow conflicts with the application
being run.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8134 from vanzin/SPARK-9833.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is my retry to suggest a fix for using Spark on Yarn on Windows. The former request lacked coding style which I hope to have learned to do better, and wasn't a true solution as I didn't really understand where the problem came from. Albeit being still a bit obscure, I can name the "players" and have come up with a better explaination of why I am suggesting this fix.
I also used vanzin and srowen input to *try* to give a more elegant solution. I am not so sure if that worked out though.
I still hope that this PR is a lot more useful than the last. Also do I hope that this is a _solution_ to the problem that Spark doesn't work on Yarn on Windows. With these changes it works (and I can also explain why!).
I still believe that a Unit Test should be included, kind of like the one I committed the last time. But that was premature, as I want to get the principal 'Go' from vanzin and srowen.
Thanks for your time both of you.
Author: Carsten Blank <blank@cncengine.com>
Author: cbvoxel <blank@cncengine.com>
Closes #8053 from cbvoxel/master.
|
|
|
|
|
|
|
|
|
|
| |
Here propose to remove old MRJobConfig#DEFAULT_APPLICATION_CLASSPATH support, since we now move to Yarn stable API.
vanzin and sryza , any opinion on this? If we still want to support old API, I can close it. But as far as I know now major Hadoop releases has moved to stable API.
Author: jerryshao <sshao@hortonworks.com>
Closes #8192 from jerryshao/SPARK-9969.
|
|
|
|
|
|
|
|
|
| |
Add a new test case in yarn/ClientSuite which checks how the various SparkConf
and ClientArguments propagate into the ApplicationSubmissionContext.
Author: Dennis Huo <dhuo@google.com>
Closes #8072 from dennishuo/dhuo-yarn-application-tags.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The YARN backend doesn't like when user code calls `System.exit`,
since it cannot know the exit status and thus cannot set an
appropriate final status for the application.
So, for pyspark, avoid that call and instead throw an exception with
the exit code. SparkSubmit handles that exception and exits with
the given exit code, while YARN uses the exit code as the failure
code for the Spark app.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7751 from vanzin/SPARK-9416.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor Utils class and create ShutdownHookManager.
NOTE: Wasn't able to run /dev/run-tests on windows machine.
Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark)
ex:
log4j.rootCategory=WARN,console,redis
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.graphx.Pregel=INFO
log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender
log4j.appender.redis.endpoints=hostname:port
log4j.appender.redis.key=mykey
log4j.appender.redis.alwaysBatch=false
log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1
Author: michellemay <mlemay@gmail.com>
Closes #8109 from michellemay/SPARK-9826.
|
|
|
|
|
|
|
|
| |
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.
Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>
Closes #7657 from neurons/SPARK-9092.
|
|
|
|
|
|
|
|
|
|
| |
memory is above the max threshold of this cluster on YARN mode
Author: Yadong Qi <qiyadong2010@gmail.com>
Closes #8028 from watermen/SPARK-9737 and squashes the following commits:
48bdf3d [Yadong Qi] Add suggested configuration.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, when we kill application on Yarn, then will call sc.stop() at Yarn application state monitor thread, then in YarnClientSchedulerBackend.stop() will call interrupt this will cause SparkContext not stop fully as we will wait executor to exit.
Author: linweizhong <linweizhong@huawei.com>
Closes #7846 from Sephiroth-Lin/SPARK-9519 and squashes the following commits:
1ae736d [linweizhong] Update comments
2e8e365 [linweizhong] Add comment explaining the code
ad0e23b [linweizhong] Update
243d2c7 [linweizhong] Confirm stop sc successfully when application was killed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Look at HBase's configuration to make sure it's configured for
Kerberos. If the HBase configuration is missing, or if HBase is
configured for non-kerberos authentication, then skip getting
tokens.
Reference: http://hbase.apache.org/book.html#security.prerequisites
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7810 from vanzin/SPARK-9491 and squashes the following commits:
a57c776 [Marcelo Vanzin] [SPARK-9491] Avoid fetching HBase tokens when not needed.
|
|
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7706 from vanzin/SPARK-9388 and squashes the following commits:
028b990 [Marcelo Vanzin] Single log statement.
3c5fb6a [Marcelo Vanzin] YARN not Yarn.
5bcd7a0 [Marcelo Vanzin] [SPARK-9388] [yarn] Make executor info log messages easier to read.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds code to notify the scheduler backend when a container dies in YARN.
Author: Mridul Muralidharan <mridulm@yahoo-inc.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7431 from vanzin/SPARK-8297 and squashes the following commits:
471e4a0 [Marcelo Vanzin] Fix unit test after merge.
d4adf4e [Marcelo Vanzin] Merge branch 'master' into SPARK-8297
3b262e8 [Marcelo Vanzin] Merge branch 'master' into SPARK-8297
537da6f [Marcelo Vanzin] Make an expected log less scary.
04dc112 [Marcelo Vanzin] Use driver <-> AM communication to send "remove executor" request.
8855b97 [Marcelo Vanzin] Merge remote-tracking branch 'mridul/fix_yarn_scheduler_bug' into SPARK-8297
687790f [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
e1b0067 [Mridul Muralidharan] Fix failing testcase, fix merge issue from our 1.3 -> master
9218fcc [Mridul Muralidharan] Fix failing testcase
362d64a [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
62ad0cc [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
bbf8811 [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug
9ee1307 [Mridul Muralidharan] Fix SPARK-8297
a3a0f01 [Mridul Muralidharan] Fix SPARK-8297
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocation requests
Currently there's no locality preference for container request in YARN mode, this will affect the performance if fetching data remotely, so here proposed to add locality in Yarn dynamic allocation mode.
Ping sryza, please help to review, thanks a lot.
Author: jerryshao <saisai.shao@intel.com>
Closes #6394 from jerryshao/SPARK-4352 and squashes the following commits:
d45fecb [jerryshao] Add documents
6c3fe5c [jerryshao] Fix bug
8db6c0e [jerryshao] Further address the comments
2e2b2cb [jerryshao] Fix rebase compiling problem
ce5f096 [jerryshao] Fix style issue
7f7df95 [jerryshao] Fix rebase issue
9ca9e07 [jerryshao] Code refactor according to comments
d3e4236 [jerryshao] Further address the comments
5e7a593 [jerryshao] Fix bug introduced code rebase
9ca7783 [jerryshao] Style changes
08317f9 [jerryshao] code and comment refines
65b2423 [jerryshao] Further address the comments
a27c587 [jerryshao] address the comment
27faabc [jerryshao] redundant code remove
9ce06a1 [jerryshao] refactor the code
f5ba27b [jerryshao] Style fix
2c6cc8a [jerryshao] Fix bug and add unit tests
0757335 [jerryshao] Consider the distribution of existed containers to recalculate the new container requests
0ad66ff [jerryshao] Fix compile bugs
1c20381 [jerryshao] Minor fix
5ef2dc8 [jerryshao] Add docs and improve the code
3359814 [jerryshao] Fix rebase and test bugs
0398539 [jerryshao] reinitialize the new implementation
67596d6 [jerryshao] Still fix the code
654e1d2 [jerryshao] Fix some bugs
45b1c89 [jerryshao] Further polish the algorithm
dea0152 [jerryshao] Enable node locality information in YarnAllocator
74bbcc6 [jerryshao] Support node locality for dynamic allocation initial commit
|
|
|
|
|
|
|
|
|
|
|
|
| |
…r mode.
The NodeReports API currently used does not work in secure mode since we do not get RM tokens. Instead this patch just uses environment vars exported by YARN to create the log links.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes #7624 from harishreedharan/driver-logs-env and squashes the following commits:
7368c7e [Hari Shreedharan] [SPARK-8988][YARN] Make sure driver log links appear in secure cluster mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tokens
In client side, the flow is SparkSubmit -> SparkContext -> yarn/Client. Since the yarn client only gets a cloned config and the staging dir is set here, it is not really possible to do re-logins in the SparkContext. So, do the initial logins in Spark Submit and do re-logins as we do now in the AM, but the Client behaves like an executor in this specific context and reads the credentials file to update the tokens. This way, even if the streaming context is started up from checkpoint - it is fine since we have logged in from SparkSubmit itself itself.
Author: Hari Shreedharan <hshreedharan@apache.org>
Closes #7394 from harishreedharan/yarn-client-login and squashes the following commits:
9a2166f [Hari Shreedharan] make it possible to use command line args and config parameters together.
de08f57 [Hari Shreedharan] Fix import order.
5c4fa63 [Hari Shreedharan] Add a comment explaining what is being done in YarnClientSchedulerBackend.
c872caa [Hari Shreedharan] Fix typo in log message.
2c80540 [Hari Shreedharan] Move token renewal to YarnClientSchedulerBackend.
0c48ac2 [Hari Shreedharan] Remove direct use of ExecutorDelegationTokenUpdater in Client.
26f8bfa [Hari Shreedharan] [SPARK-8851][YARN] In Client mode, make sure the client logs in and updates tokens.
58b1969 [Hari Shreedharan] Simple attempt 1.
|
|
|
|
|
|
|
|
|
|
|
| |
andrewor14 davies vanzin can you take a look at this? thanks
Author: Lianhui Wang <lianhuiwang09@gmail.com>
Closes #7438 from lianhuiwang/SPARK-8646 and squashes the following commits:
cb3f12d [Lianhui Wang] add whitespace
6d874a6 [Lianhui Wang] support pyspark for yarn-client
|