| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
be integer literal
I think, it need to keep the priority of shutdown hook for ApplicationMaster than the priority of shutdown hook for o.a.h.FileSystem depending on changing the priority for FileSystem.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #2283 from sarutak/SPARK-3410 and squashes the following commits:
1d44fef [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3410
bd6cc53 [Kousuke Saruta] Modified style
ee6f1aa [Kousuke Saruta] Added constant "SHUTDOWN_HOOK_PRIORITY" to ApplicationMaster
54eb68f [Kousuke Saruta] Changed Shutdown hook priority to 20
2f0aee3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3410
4c5cb93 [Kousuke Saruta] Modified the priority for AM's shutdown hook
217d1a4 [Kousuke Saruta] Removed unused import statements
717aba2 [Kousuke Saruta] Modified ApplicationMaster to make to keep the priority of shutdown hook for ApplicationMaster higher than the priority of shutdown hook for HDFS
|
|
|
|
|
|
|
|
| |
Author: Thomas Graves <tgraves@apache.org>
Closes #2373 from tgravescs/SPARK-3456 and squashes the following commits:
77e9532 [Thomas Graves] [SPARK-3456] YarnAllocator on alpha can lose container requests to RM
|
|
|
|
|
|
|
|
|
|
| |
...s
Author: Sandy Ryza <sandy@cloudera.com>
Closes #1934 from sryza/sandy-spark-3014 and squashes the following commits:
ae19cc1 [Sandy Ryza] SPARK-3014. Log a more informative messages in a couple failure scenarios
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.
By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.
Author: Andrew Or <andrewor14@gmail.com>
Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:
332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
30c93a2 [Andrew Or] Simplify streaming UISuite
a431b84 [Andrew Or] Fix streaming test failures
8f5ae53 [Andrew Or] Fix no new line at the end
29c9b5b [Andrew Or] Disable SparkUI for tests
|
|
|
|
|
|
|
|
|
|
|
| |
Updated pull request, reflecting YARN stable and alpha states. I am getting intermittent test failures on my own test infrastructure. Is that tracked anywhere yet?
Author: Chris Cope <ccope@resilientscience.com>
Closes #2253 from copester/master and squashes the following commits:
5ad89da [Chris Cope] [SPARK-2140] Removing calculateAMMemory functions since they are no longer needed.
52b4e45 [Chris Cope] [SPARK-2140] Updating heap memory calculation for YARN stable and alpha.
|
|
|
|
|
|
|
|
|
|
| |
This patch copies the approach used in the MapReduce application master for launching containers.
Author: Sandy Ryza <sandy@cloudera.com>
Closes #663 from sryza/sandy-spark-1713 and squashes the following commits:
036550d [Sandy Ryza] SPARK-1713. [YARN] Use a threadpool for launching executor containers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...s https
Author: Benoy Antony <benoy@apache.org>
Closes #2276 from benoyantony/SPARK-3286 and squashes the following commits:
c3d51ee [Benoy Antony] Use address with scheme, but Allpha version removes the scheme
e82f94e [Benoy Antony] Use address with scheme, but Allpha version removes the scheme
92127c9 [Benoy Antony] rebasing from master
450c536 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme is https
f060c02 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme is https
|
|
|
|
|
|
|
|
|
|
|
| |
Pass along the acl settings when we launch a container so that they can be applied to viewing the logs on a running NodeManager.
Author: Thomas Graves <tgraves@apache.org>
Closes #2185 from tgravescs/SPARK-3260 and squashes the following commits:
6f94b5a [Thomas Graves] make unit test more robust
28b9dd3 [Thomas Graves] yarn - pass acls along with executor launch
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
history server.
This change exposes the application ID generated by the Spark Master, Mesos or Yarn
via the SparkListenerApplicationStart event. It then uses that information to expose the
application via its ID in the history server, instead of using the internal directory name
generated by the event logger as an application id. This allows someone who knows
the application ID to easily figure out the URL for the application's entry in the HS, aside
from looking better.
In Yarn mode, this is used to generate a direct link from the RM application list to the
Spark history server entry (thus providing a fix for SPARK-2150).
Note this sort of assumes that the different managers will generate app ids that are
sufficiently different from each other that clashes will not occur.
Author: Marcelo Vanzin <vanzin@cloudera.com>
This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrewor14@gmail.com>
Closes #1218 from vanzin/yarn-hs-link-2 and squashes the following commits:
2d19f3c [Marcelo Vanzin] Review feedback.
6706d3a [Marcelo Vanzin] Implement applicationId() in base classes.
56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup.
44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
a0056e6 [Marcelo Vanzin] Unbreak test.
4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
25f2826 [Marcelo Vanzin] Add MIMA excludes.
f0ba90f [Marcelo Vanzin] Use BufferedIterator.
c90a08d [Marcelo Vanzin] Remove unused code.
3f8ec66 [Marcelo Vanzin] Review feedback.
21aa71b [Marcelo Vanzin] Fix JSON test.
b022bae [Marcelo Vanzin] Undo SparkContext cleanup.
c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
4e3483f [Marcelo Vanzin] Fix test.
57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option.
311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
d35d86f [Marcelo Vanzin] Fix yarn backend after rebase.
36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile().
0afd696 [Marcelo Vanzin] Wait until master responds before returning from start().
abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id.
26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID.
b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS.
2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event.
ed10348 [Marcelo Vanzin] Expose application id to spark context.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move all shared logic to the base YarnAllocator class, and leave
the version-specific logic in the version-specific module.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #2169 from vanzin/SPARK-3187 and squashes the following commits:
46c2826 [Marcelo Vanzin] Hide the privates.
4dc9c83 [Marcelo Vanzin] Actually release containers.
8b1a077 [Marcelo Vanzin] Changes to the Yarn alpha allocator.
f3f5f1d [Marcelo Vanzin] [SPARK-3187] [yarn] Cleanup allocator code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-3010
this pr is to fix redundant conditional in spark, such as
1.
private[spark] def codegenEnabled: Boolean =
if (getConf(CODEGEN_ENABLED, "false") == "true") true else false
2.
x => if (x == 2) true else false
...
Author: scwf <wangfei1@huawei.com>
Author: wangfei <wangfei_hello@126.com>
Closes #1992 from scwf/condition and squashes the following commits:
b2a044a [scwf] merge SecurityManager
e16239c [scwf] fix confilct
6811401 [scwf] fix merge confilct
0824df4 [scwf] Merge branch 'master' of https://github.com/apache/spark into patch-4
e274515 [scwf] fix redundant conditions
d032bf9 [wangfei] [SQL]Excess judgment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Different places in the code were instantiating Configuration / YarnConfiguration objects in different ways. This could lead to confusion for people who actually expected "spark.hadoop.*" options to end up in the configs used by Spark code, since that would only happen for the SparkContext's config.
This change modifies most places to use SparkHadoopUtil to initialize configs, and make that method do the translation that previously was only done inside SparkContext.
The places that were not changed fall in one of the following categories:
- Test code where this doesn't really matter
- Places deep in the code where plumbing SparkConf would be too difficult for very little gain
- Default values for arguments - since the caller can provide their own config in that case
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #1843 from vanzin/SPARK-2889 and squashes the following commits:
52daf35 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
f179013 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
51e71cf [Marcelo Vanzin] Add test to ensure that overriding Yarn configs works.
53f9506 [Marcelo Vanzin] Add DeveloperApi annotation.
3d345cb [Marcelo Vanzin] Restore old method for backwards compat.
fc45067 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
0ac3fdf [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
3f26760 [Marcelo Vanzin] Compilation fix.
f16cadd [Marcelo Vanzin] Initialize config in SparkHadoopUtil.
b8ab173 [Marcelo Vanzin] Update Utils API to take a Configuration argument.
1e7003f [Marcelo Vanzin] Replace explicit Configuration instantiation with SparkHadoopUtil.
|
|
|
|
|
|
|
|
| |
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #2177 from sarutak/SPARK-3279 and squashes the following commits:
2955edc [Kousuke Saruta] Removed useless field variable from ApplicationMaster
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change modifies the Yarn module so that all the logic related
to running the ApplicationMaster is localized. Instead of, previously,
4 different classes with mostly identical code, now we have:
- A single, shared ApplicationMaster class, which can operate both in
client and cluster mode, and substitutes the old ApplicationMaster
(for cluster mode) and ExecutorLauncher (for client mode).
The benefit here is that all different execution modes for all supported
yarn versions use the same shared code for monitoring executor allocation,
setting up configuration, and monitoring the process's lifecycle.
- A new YarnRMClient interface, which defines basic RM functionality needed
by the ApplicationMaster. This interface has concrete implementations for
each supported Yarn version.
- A new YarnAllocator interface, which just abstracts the existing interface
of the YarnAllocationHandler class. This is to avoid having to touch the
allocator code too much in this change, although it might benefit from a
similar effort in the future.
The end result is much easier to understand code, with much less duplication,
making it much easier to fix bugs, add features, and test everything knowing
that all supported versions will behave the same.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #2020 from vanzin/SPARK-2933 and squashes the following commits:
3bbf3e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
ff389ed [Marcelo Vanzin] Do not interrupt reporter thread from within itself.
3a8ed37 [Marcelo Vanzin] Remote stale comment.
0f5142c [Marcelo Vanzin] Review feedback.
41f8c8a [Marcelo Vanzin] Fix app status reporting.
c0794be [Marcelo Vanzin] Correctly clean up staging directory.
92770cc [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
ecaf332 [Marcelo Vanzin] Small fix to shutdown code.
f02d3f8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2933
f581122 [Marcelo Vanzin] Review feedback.
557fdeb [Marcelo Vanzin] Cleanup a couple more constants.
be6068d [Marcelo Vanzin] Restore shutdown hook to clean up staging dir.
5150993 [Marcelo Vanzin] Some more cleanup.
b6289ab [Marcelo Vanzin] Move cluster/client code to separate methods.
ecb23cd [Marcelo Vanzin] More trivial cleanup.
34f1e63 [Marcelo Vanzin] Fix some questionable error handling.
5657c7d [Marcelo Vanzin] Finish app if SparkContext initialization times out.
0e4be3d [Marcelo Vanzin] Keep "ExecutorLauncher" as the main class for client-mode AM.
91beabb [Marcelo Vanzin] Fix UI filter registration.
8c72239 [Marcelo Vanzin] Trivial cleanups.
99a52d5 [Marcelo Vanzin] Changes to the yarn-alpha project to use common AM code.
848ca6d [Marcelo Vanzin] [SPARK-2933] [yarn] Refactor and cleanup Yarn AM code.
|
|
|
|
|
|
|
|
| |
Author: XuTingjun <1039320815@qq.com>
Closes #1614 from XuTingjun/yarn-bug and squashes the following commits:
f07096e [XuTingjun] Update ClientArguments.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the way Yarn runs things through bash, normal quoting doesn't
work as expected. This change applies the necessary voodoo to the user
args to avoid issues with bash and special characters.
The change also uncovered an issue with the event logger app name
sanitizing code; it wasn't cleaning up all "bad" characters, so
sometimes it would fail to create the log dirs. I just added some
more bad character replacements.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #1724 from vanzin/SPARK-2718 and squashes the following commits:
cc84b89 [Marcelo Vanzin] Review feedback.
c1a257a [Marcelo Vanzin] Add test for backslashes.
55571d4 [Marcelo Vanzin] Unbreak yarn-client.
515613d [Marcelo Vanzin] [SPARK-2718] [yarn] Handle quotes and other characters in user args.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In SPARK-1946(PR #900), configuration <code>spark.scheduler.minRegisteredExecutorsRatio</code> was introduced. However, in standalone mode, there is a race condition where isReady() can return true because totalExpectedExecutors has not been correctly set.
Because expected executors is uncertain in standalone mode, the PR try to use CPU cores(<code>--total-executor-cores</code>) as expected resources to judge whether SchedulerBackend is ready.
Author: li-zhihui <zhihui.li@intel.com>
Author: Li Zhihui <zhihui.li@intel.com>
Closes #1525 from li-zhihui/fixre4s and squashes the following commits:
e9a630b [Li Zhihui] Rename variable totalExecutors and clean codes
abf4860 [Li Zhihui] Push down variable totalExpectedResources to children classes
ca54bd9 [li-zhihui] Format log with String interpolation
88c7dc6 [li-zhihui] Few codes and docs refactor
41cf47e [li-zhihui] Fix race condition at SchedulerBackend.isReady in standalone mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note that this also documents spark.executorEnv.* which to me means its public. If we don't want that please speak up.
Author: Thomas Graves <tgraves@apache.org>
Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits:
11525df [Thomas Graves] more doc changes
553bad0 [Thomas Graves] fix documentation
152bf7c [Thomas Graves] fix docs
5382326 [Thomas Graves] try fix docs
32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was easier to combine these 2 jira since they touch many of the same places. This pr adds the following:
- adds modify acls
- adds admin acls (list of admins/users that get added to both view and modify acls)
- modify Kill button on UI to take modify acls into account
- changes config name of spark.ui.acls.enable to spark.acls.enable since I choose poorly in original name. We keep backwards compatibility so people can still use spark.ui.acls.enable. The acls should apply to any web ui as well as any CLI interfaces.
- send view and modify acls information on to YARN so that YARN interfaces can use (yarn cli for killing applications for example).
Author: Thomas Graves <tgraves@apache.org>
Closes #1196 from tgravescs/SPARK-1890 and squashes the following commits:
8292eb1 [Thomas Graves] review comments
b92ec89 [Thomas Graves] remove unneeded variable from applistener
4c765f4 [Thomas Graves] Add in admin acls
72eb0ac [Thomas Graves] Add modify acls
|
|
|
|
|
|
|
|
|
|
|
| |
Add a config (spark.yarn.access.namenodes) to allow applications running on yarn to access other secure HDFS cluster. User just specifies the namenodes of the other clusters and we get Tokens for those and ship them with the spark application.
Author: Thomas Graves <tgraves@apache.org>
Closes #1159 from tgravescs/spark-1528 and squashes the following commits:
ddbcd16 [Thomas Graves] review comments
0ac8501 [Thomas Graves] SPARK-1528 - add support for accessing remote HDFS
|
|
|
|
|
|
|
|
|
|
| |
"ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster" appears if this code is not changed. obviously, 1024 is less than 1048, so change this
Author: derek ma <maji3@asiainfo-linkage.com>
Closes #1494 from maji2014/master and squashes the following commits:
b0f6640 [derek ma] Required AM memory is "amMem", not "args.amMemory"
|
|
|
|
|
|
|
|
|
|
| |
Author: GuoQiang Li <witgo@qq.com>
Closes #1180 from witgo/SPARK-2037 and squashes the following commits:
3d52411 [GuoQiang Li] review commit
7058f4d [GuoQiang Li] Correctly stop SparkContext
6d0561f [GuoQiang Li] Fix: yarn client mode doesn't support spark.yarn.max.executor.failures
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...rce manager UI
Use the event logger directory to provide a direct link to finished
application UI in yarn resourcemanager UI.
Author: Rahul Singhal <rahul.singhal@guavus.com>
Closes #1094 from rahulsinghaliitd/SPARK-2150 and squashes the following commits:
95f230c [Rahul Singhal] SPARK-2150: Provide direct link to finished application UI in yarn resource manager UI
|
|
|
|
|
|
|
|
|
|
| |
Opting to the option 2 defined in SPARK-2577, i.e., retrieve and pass the correct file system object to addResource.
Author: Gera Shegalov <gera@twitter.com>
Closes #1483 from gerashegalov/master and squashes the following commits:
90c9087 [Gera Shegalov] [YARN] SPARK-2577: File upload to viewfs is broken due to mount point resolution
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #634 from sryza/sandy-spark-1707 and squashes the following commits:
2f6e358 [Sandy Ryza] Default min registered executors ratio to .8 for YARN
354c630 [Sandy Ryza] Remove outdated comments
c744ef3 [Sandy Ryza] Take out waitForInitialAllocations
2a4329b [Sandy Ryza] SPARK-1707. Remove unnecessary 3 second sleep in YarnClusterScheduler
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
6022bcd [witgo] review commit
1fbb925 [witgo] add addAmIpFilter to yarn alpha
210299c [witgo] review commit
1b92a07 [witgo] review commit
6896586 [witgo] Add comments to addWebUIFilter
3e9630b [witgo] review commit
142ee29 [witgo] review commit
1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
registered
Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.
A simple solution is sleeping few seconds in application, so that executors have enough time to register.
The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.
\# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
spark.scheduler.minRegisteredExecutorsRatio = 0.8
\# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000
Author: li-zhihui <zhihui.li@intel.com>
Closes #900 from li-zhihui/master and squashes the following commits:
b9f8326 [li-zhihui] Add logs & edit docs
1ac08b1 [li-zhihui] Add new configs to user docs
22ead12 [li-zhihui] Move waitBackendReady to postStartHook
c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
4261454 [li-zhihui] Add docs for new configs & code style
ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
e7b6272 [li-zhihui] support yarn-cluster
37f7dc2 [li-zhihui] support yarn mode(percentage style)
3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark.worker.instances was added as part of this commit: https://github.com/apache/spark/commit/1617816090e7b20124a512a43860a21232ebf511
My understanding is that SPARK_WORKER_INSTANCES is supported for backwards compatibility,
but spark.worker.instances is never used (SparkSubmit.scala sets spark.executor.instances) so should
not have been added.
@sryza @pwendell @tgravescs LMK if I'm understanding this correctly
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #1214 from kayousterhout/yarn_config and squashes the following commits:
3d7c491 [Kay Ousterhout] Remove use of spark.worker.instances
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent changes ignored the fact that path may be defined with "local:"
URIs, which means they need to be explicitly added to the classpath
everywhere a remote process is started. This change fixes that by:
- Using the correct methods to add paths to the classpath
- Creating SparkConf settings for the Spark jar itself and for the
user's jar
- Propagating those two settings to the remote processes where needed
This ensures that both in client and in cluster mode, the driver has
the necessary info to build the executor's classpath and have things
still work when they contain "local:" references.
The change also fixes some confusion in ClientBase about whether
to use SparkConf or system properties to propagate config options to
the driver and executors, by standardizing on using data held by
SparkConf.
On the cleanup front, I removed the hacky way that log4j configuration
was being propagated to handle the "local:" case. It's much more cleanly
(and generically) handled by using spark-submit arguments (--files to
upload a config file, or setting spark.executor.extraJavaOptions to pass
JVM arguments and use a local file).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #560 from vanzin/yarn-local-2 and squashes the following commits:
4e7f066 [Marcelo Vanzin] Correctly propagate SPARK_JAVA_OPTS to driver/executor.
6a454ea [Marcelo Vanzin] Use constants for PWD in test.
6dd5943 [Marcelo Vanzin] Fix propagation of config options to driver / executor.
b2e377f [Marcelo Vanzin] Review feedback.
93c3f85 [Marcelo Vanzin] Fix ClassCastException in test.
e5c682d [Marcelo Vanzin] Fix cluster mode, restore SPARK_LOG4J_CONF.
1dfbb40 [Marcelo Vanzin] Add documentation for spark.yarn.jar.
bbdce05 [Marcelo Vanzin] [SPARK-1395] Fix "local:" URI support in Yarn mode (again).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #969 from witgo/yarn_ClientBase and squashes the following commits:
8117765 [witgo] review commit
3bdbc52 [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
5261b6c [witgo] fix sys.props.get("SPARK_YARN_DIST_FILES")
e3c1107 [witgo] update docs
b6a9aa1 [witgo] merge master
c8b4554 [witgo] review commit
2f48789 [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
8d7b82f [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
1048549 [witgo] remove Utils.resolveURIs
871f1db [witgo] add spark.yarn.dist.* documentation
41bce59 [witgo] review commit
35d6fa0 [witgo] move to ClientArguments
55d72fc [witgo] Merge branch 'master' of https://github.com/apache/spark into yarn_ClientBase
9cdff16 [witgo] review commit
8bc2f4b [witgo] review commit
20e667c [witgo] Merge branch 'master' into yarn_ClientBase
0961151 [witgo] merge master
ce609fc [witgo] Merge branch 'master' into yarn_ClientBase
8362489 [witgo] yarn.ClientBase spark.yarn.dist.* do not work
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to be killed
Author: witgo <witgo@qq.com>
Closes #894 from witgo/SPARK-1930 and squashes the following commits:
564307e [witgo] Update the running-on-yarn.md
3747515 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930
172647b [witgo] add memoryOverhead docs
a0ff545 [witgo] leaving only two configs
a17bda2 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930
478ca15 [witgo] Merge branch 'master' into SPARK-1930
d1244a1 [witgo] Merge branch 'master' into SPARK-1930
8b967ae [witgo] Merge branch 'master' into SPARK-1930
655a820 [witgo] review commit
71859a7 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1930
e3c531d [witgo] review commit
e16f190 [witgo] different memoryOverhead
ffa7569 [witgo] review commit
5c9581f [witgo] Merge branch 'master' into SPARK-1930
9a6bcf2 [witgo] review commit
8fae45a [witgo] fix NullPointerException
e0dcc16 [witgo] Adding configuration items
b6a989c [witgo] Fix container memory beyond limit, were killed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the changes is in the package of "org.apache.spark.deploy.yarn":
1) Throw exception in ClinetArguments and ClientBase instead of exit directly.
2) in Client's main method, if exception is caught, it will exit with code 1, otherwise exit with code 0.
After the fix, if user integrate the spark yarn client into their applications, when the argument is wrong or the running is finished, the application won't be terminated.
Author: John Zhao <jzhao@alpinenow.com>
Closes #490 from codeboyyong/jira_1516_systemexit_inyarnclient and squashes the following commits:
138cb48 [John Zhao] [SPARK-1516]Throw exception in yarn clinet instead of run system.exit directly. All the changes is in the package of "org.apache.spark.deploy.yarn": 1) Add a ClientException with an exitCode 2) Throws exception in ClinetArguments and ClientBase instead of exit directly 3) in Client's main method, catch exception and exit with the exitCode.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This contains a bunch of small tidyings of the Spark on YARN code.
I focused on the yarn stable code. @tgravescs, let me know if you'd like me to make these for the alpha code as well.
Author: Sandy Ryza <sandy@cloudera.com>
Closes #561 from sryza/sandy-spark-1639 and squashes the following commits:
72b6a02 [Sandy Ryza] Fix comment and set name on driver thread
c2190b2 [Sandy Ryza] SPARK-1639. Tidy up some Spark on YARN code
|
|
|
|
|
|
|
|
|
|
|
| |
from conf
Author: DB Tsai <dbtsai@dbtsai.com>
Closes #1027 from dbtsai/dbtsai-classloader and squashes the following commits:
9ac6be3 [DB Tsai] Fixed line too long
c9c7ad7 [DB Tsai] Make sure that empty string is filtered out when we get the secondary jars from conf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of ClientBase.getDefaultYarnApplicationClasspath inspects
the MRJobConfig class for the field DEFAULT_YARN_APPLICATION_CLASSPATH when it should
be really looking into YarnConfiguration. If the Application Configuration has no
yarn.application.classpath defined a NPE exception will be thrown.
Additional Changes include:
* Test Suite for ClientBase added
[ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522
Author : bernardo.gomezpalacio@gmail.com
Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test
Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>
Closes #433 from berngp/feature/SPARK-1522 and squashes the following commits:
2c2e118 [Bernardo Gomez Palacio] [SPARK-1522]: YARN ClientBase throws a NPE if there is no YARN Application specific CP
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-1898
Author: Colin Patrick McCabe <cmccabe@cloudera.com>
Closes #850 from cmccabe/master and squashes the following commits:
d66eddc [Colin Patrick McCabe] SPARK-1898: In deploy.yarn.Client, use YarnClient rather than YarnClientImpl
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).
`spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!
I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.
CC: @dbtsai @sryza
Author: Xiangrui Meng <meng@databricks.com>
Closes #848 from mengxr/yarn-classpath and squashes the following commits:
23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods
a40f6ed [Xiangrui Meng] standalone -> cluster
65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client
11e5354 [Xiangrui Meng] minor changes
3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf
dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
|
|
|
|
|
|
|
|
| |
Author: Andrew Or <andrewor14@gmail.com>
Closes #847 from andrewor14/yarn-typo and squashes the following commits:
c1906af [Andrew Or] Stoped -> Stopped
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SparkSubmit ignores `--jars` for YARN client. This is a bug.
This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.
Tested on a YARN cluster.
Author: Andrew Or <andrewor14@gmail.com>
Closes #710 from andrewor14/yarn-jars and squashes the following commits:
35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #539 from vanzin/yarn-app-name and squashes the following commits:
7d1ca4f [Marcelo Vanzin] [SPARK-1631] Correctly set the Yarn app name when launching the AM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass the configs as java options since the executor needs to know before it registers whether to create the connection using authentication or not. We could see about passing only the authentication configs but for now I just had it pass them all.
I also updating it to use a list to construct the command to make it the same as ClientBase and avoid any issues with spaces.
Author: Thomas Graves <tgraves@apache.org>
Closes #649 from tgravescs/SPARK-1569 and squashes the following commits:
0178ab8 [Thomas Graves] add akka settings
22a8735 [Thomas Graves] Change to only path spark.auth* configs
8ccc1d4 [Thomas Graves] SPARK-1569 Spark on Yarn, authentication broken
|
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #586 from sryza/sandy-spark-1588 and squashes the following commits:
35eb38e [Sandy Ryza] Scalify
b361684 [Sandy Ryza] SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
|
|
|
|
|
|
|
|
|
|
| |
Sorry folks. This should make the change for SPARK-1607 compile again. Verified this time with the yarn build enabled.
Author: Sean Owen <sowen@cloudera.com>
Closes #556 from srowen/SPARK-1607.2 and squashes the following commits:
e3fe7a3 [Sean Owen] Fix syntax adapting Int result to Short
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Octal literals like "0700" are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618
This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular)
Author: Sean Owen <sowen@cloudera.com>
Closes #529 from srowen/SPARK-1607 and squashes the following commits:
1ee0e67 [Sean Owen] Use Integer.parseInt(...,8) for octal literal instead of hex equivalent
0102f3d [Sean Owen] Replace octal literals, removed in Scala 2.11, with hex literals
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.
Author: Mridul Muralidharan <mridulm80@apache.org>
This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>
Closes #505 from mridulm/windows_fixes and squashes the following commits:
ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows
|
|
|
|
|
|
|
|
|
|
|
|
| |
In particular when the HADOOP_CONF_DIR is not not specified.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #488 from pwendell/hadoop-cleanup and squashes the following commits:
fe95f13 [Patrick Wendell] Changes based on Andrew's feeback
18d09c1 [Patrick Wendell] Review comments from Andrew
17929cc [Patrick Wendell] Assorted clean-up for Spark-on-YARN.
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #483 from vanzin/yarn-2.4 and squashes the following commits:
0fc57d8 [Marcelo Vanzin] Fix compilation on Hadoop 2.4.x.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:
1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #299 from pwendell/config-cleanup and squashes the following commits:
127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This only works for the three paths defined in the environment
(SPARK_JAR, SPARK_YARN_APP_JAR and SPARK_LOG4J_CONF).
Tested by running SparkPi with local: and file: URIs against Yarn cluster (no "upload" shows up in logs in the local case).
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #303 from vanzin/yarn-local and squashes the following commits:
82219c1 [Marcelo Vanzin] [SPARK-1395] Allow "local:" URIs to work on Yarn.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.
Author: xuan <xuan@MacBook-Pro.local>
Author: xuan <xuan@macbook-pro.home>
Closes #396 from xgong/master and squashes the following commits:
42b5984 [xuan] Remove two extra imports
bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell
be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release
|