aboutsummaryrefslogtreecommitdiff
path: root/mesos
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-18662] Move resource managers to separate directoryAnirudh2016-12-0630-5886/+0
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? * Moves yarn and mesos scheduler backends to resource-managers/ sub-directory (in preparation for https://issues.apache.org/jira/browse/SPARK-18278) * Corresponding change in top-level pom.xml. Ref: https://github.com/apache/spark/pull/16061#issuecomment-263649340 ## How was this patch tested? * Manual tests /cc rxin Author: Anirudh <ramanathana@google.com> Closes #16092 from foxish/fix-scheduler-structure-2.
* [SPARK-18171][MESOS] Show correct framework address in mesos master web ui ↵Shuai Lin2016-12-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when the advertised address is used ## What changes were proposed in this pull request? In [SPARK-4563](https://issues.apache.org/jira/browse/SPARK-4563) we added the support for the driver to advertise a different hostname/ip (`spark.driver.host` to the executors other than the hostname/ip the driver actually binds to (`spark.driver.bindAddress`). But in the mesos webui's frameworks page, it still shows the driver's binds hostname/ip (though the web ui link is correct). We should fix it to make them consistent. Before: ![mesos-spark](https://cloud.githubusercontent.com/assets/717363/19835148/4dffc6d4-9eb8-11e6-8999-f80f10e4c3f7.png) After: ![mesos-spark2](https://cloud.githubusercontent.com/assets/717363/19835149/54596ae4-9eb8-11e6-896c-230426acd1a1.png) This PR doesn't affect the behavior on the spark side, only makes the display on the mesos master side more consistent. ## How was this patch tested? Manual test. - Build the package and build a docker image (spark:2.1-test) ``` sh ./dev/make-distribution.sh -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn -Pmesos ``` - Then run the spark driver inside a docker container. ``` sh docker run --rm -it \ --name=spark \ -p 30000-30010:30000-30010 \ -e LIBPROCESS_ADVERTISE_IP=172.17.42.1 \ -e LIBPROCESS_PORT=30000 \ -e MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-1.0.0.so \ -e MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos-1.0.0.so \ -e SPARK_HOME=/opt/dist \ spark:2.1-test ``` - Inside the container, launch the spark driver, making use of the advertised address: ``` sh /opt/dist/bin/spark-shell \ --master mesos://zk://172.17.42.1:2181/mesos \ --conf spark.driver.host=172.17.42.1 \ --conf spark.driver.bindAddress=172.17.0.1 \ --conf spark.driver.port=30001 \ --conf spark.driver.blockManager.port=30002 \ --conf spark.ui.port=30003 \ --conf spark.mesos.coarse=true \ --conf spark.cores.max=2 \ --conf spark.executor.cores=1 \ --conf spark.executor.memory=1g \ --conf spark.mesos.executor.docker.image=spark:2.1-test ``` - Run several spark jobs to ensure everything is running fine. ``` scala val rdd = sc.textFile("file:///opt/dist/README.md") rdd.cache().count ``` Author: Shuai Lin <linshuai2012@gmail.com> Closes #15684 from lins05/spark-18171-show-correct-host-name-in-mesos-master-web-ui.
* [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOTReynold Xin2016-12-021-1/+1
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch bumps master branch version to 2.2.0-SNAPSHOT. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #16126 from rxin/SPARK-18695.
* [SPARK-18547][CORE] Propagate I/O encryption key when executors register.Marcelo Vanzin2016-11-283-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation credential propagation, this change explicitly distributes the key using the messages exchanged between driver and executor during registration. When RPC encryption is enabled, this means key propagation is also secure. This allows shuffle encryption to work in non-YARN mode, which means that it's easier to write unit tests for areas of the code that are affected by the feature. The key is stored in the SecurityManager; because there are many instances of that class used in the code, the key is only guaranteed to exist in the instance managed by the SparkEnv. This path was chosen to avoid storing the key in the SparkConf, which would risk having the key being written to disk as part of the configuration (as, for example, is done when starting YARN applications). Tested by new and existing unit tests (which were moved from the YARN module to core), and by running apps with shuffle encryption enabled. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #15981 from vanzin/SPARK-18547.
* Fix Mesos build break for Scala 2.10.Reynold Xin2016-11-201-1/+1
|
* [SPARK-17062][MESOS] add conf option to mesos dispatcherStavros Kontopoulos2016-11-194-18/+164
| | | | | | | | | | | | | Adds --conf option to set spark configuration properties in mesos dispacther. Properties provided with --conf take precedence over properties within the properties file. The reason for this PR is that for simple configuration or testing purposes we need to provide a property file (ideally a shared one for a cluster) even if we just provide a single property. Manually tested. Author: Stavros Kontopoulos <st.kontopoulos@gmail.com> Author: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com> Closes #14650 from skonto/dipatcher_conf.
* [SPARK-18232][MESOS] Support CNIMichael Gummelt2016-11-146-89/+116
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Adds support for CNI-isolated containers ## How was this patch tested? I launched SparkPi both with and without `spark.mesos.network.name`, and verified the job completed successfully. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #15740 from mgummelt/spark-342-cni.
* [SPARK-18076][CORE][SQL] Fix default Locale used in DateFormat, NumberFormat ↵Sean Owen2016-11-021-6/+5
| | | | | | | | | | | | | | | to Locale.US ## What changes were proposed in this pull request? Fix `Locale.US` for all usages of `DateFormat`, `NumberFormat` ## How was this patch tested? Existing tests. Author: Sean Owen <sowen@cloudera.com> Closes #15610 from srowen/SPARK-18076.
* [SPARK-18204][WEBUI] Remove SparkUI.appUIAddressJacek Laskowski2016-11-022-2/+2
| | | | | | | | | | | | | ## What changes were proposed in this pull request? Removing `appUIAddress` attribute since it is no longer in use. ## How was this patch tested? Local build Author: Jacek Laskowski <jacek@japila.pl> Closes #15603 from jaceklaskowski/sparkui-fixes.
* [SPARK-18114][MESOS] Fix mesos cluster scheduler generage command option errorWang Lei2016-11-011-1/+1
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Enclose --conf option value with "" to support multi value configs like spark.driver.extraJavaOptions, without "", driver will fail to start. ## How was this patch tested? Jenkins Tests. Test in our production environment, also unit tests, It is a very small change. Author: Wang Lei <lei.wang@kongming-inc.com> Closes #15643 from LeightonWong/messos-cluster.
* [SPARK-16881][MESOS] Migrate Mesos configs to use ConfigEntrySandeep Singh2016-11-014-6/+68
| | | | | | | | | | | | | ## What changes were proposed in this pull request? Migrate Mesos configs to use ConfigEntry ## How was this patch tested? Jenkins Tests Author: Sandeep Singh <sandeep@techaddict.me> Closes #15654 from techaddict/SPARK-16881.
* [SPARK-15994][MESOS] Allow enabling Mesos fetch cache in coarse executor backendCharles Allen2016-11-014-5/+38
| | | | | | | | | | | | | | | | Mesos 0.23.0 introduces a Fetch Cache feature http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of resources specified in command URIs. This patch: - Updates the Mesos shaded protobuf dependency to 0.23.0 - Allows setting `spark.mesos.fetcherCache.enable` to enable the fetch cache for all specified URIs. (URIs must be specified for the setting to have any affect) - Updates documentation for Mesos configuration with the new setting. This patch does NOT: - Allow for per-URI caching configuration. The cache setting is global to ALL URIs for the command. Author: Charles Allen <charles@allen-net.com> Closes #13713 from drcrallen/SPARK15994.
* [SPARK-14082][MESOS] Enable GPU support with MesosTimothy Chen2016-10-104-23/+87
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Enable GPU resources to be used when running coarse grain mode with Mesos. ## How was this patch tested? Manual test with GPU. Author: Timothy Chen <tnachen@gmail.com> Closes #14644 from tnachen/gpu_mesos.
* [SPARK-17648][CORE] TaskScheduler really needs offers to be an IndexedSeqImran Rashid2016-09-292-2/+2
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? The Seq[WorkerOffer] is accessed by index, so it really should be an IndexedSeq, otherwise an O(n) operation becomes O(n^2). In practice this hasn't been an issue b/c where these offers are generated, the call to `.toSeq` just happens to create an IndexedSeq anyway.I got bitten by this in performance tests I was doing, and its better for the types to be more precise so eg. a change in Scala doesn't destroy performance. ## How was this patch tested? Unit tests via jenkins. Author: Imran Rashid <irashid@cloudera.com> Closes #15221 from squito/SPARK-17648.
* [SPARK-4563][CORE] Allow driver to advertise a different network address.Marcelo Vanzin2016-09-213-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this feature is to allow the Spark driver to run in an isolated environment, such as a docker container, and be able to use the host's port forwarding mechanism to be able to accept connections from the outside world. The change is restricted to the driver: there is no support for achieving the same thing on executors (or the YARN AM for that matter). Those still need full access to the outside world so that, for example, connections can be made to an executor's block manager. The core of the change is simple: add a new configuration that tells what's the address the driver should bind to, which can be different than the address it advertises to executors (spark.driver.host). Everything else is plumbing the new configuration where it's needed. To use the feature, the host starting the container needs to set up the driver's port range to fall into a range that is being forwarded; this required the block manager port to need a special configuration just for the driver, which falls back to the existing spark.blockManager.port when not set. This way, users can modify the driver settings without affecting the executors; it would theoretically be nice to also have different retry counts for driver and executors, but given that docker (at least) allows forwarding port ranges, we can probably live without that for now. Because of the nature of the feature it's kinda hard to add unit tests; I just added a simple one to make sure the configuration works. This was tested with a docker image running spark-shell with the following command: docker blah blah blah \ -p 38000-38100:38000-38100 \ [image] \ spark-shell \ --num-executors 3 \ --conf spark.shuffle.service.enabled=false \ --conf spark.dynamicAllocation.enabled=false \ --conf spark.driver.host=[host's address] \ --conf spark.driver.port=38000 \ --conf spark.driver.blockManager.port=38020 \ --conf spark.ui.port=38040 Running on YARN; verified the driver works, executors start up and listen on ephemeral ports (instead of using the driver's config), and that caching and shuffling (without the shuffle service) works. Clicked through the UI to make sure all pages (including executor thread dumps) worked. Also tested apps without docker, and ran unit tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #15120 from vanzin/SPARK-4563.
* [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ↵Liwei Lin2016-09-071-6/+6
| | | | | | | | | | | | | | | | ArrayBuffer.append(A) in performance critical paths ## What changes were proposed in this pull request? We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing. ## How was this patch tested? N/A Author: Liwei Lin <lwlin7@gmail.com> Closes #14914 from lw-lin/append_to_plus_eq_v2.
* [SPARK-16533][HOTFIX] Fix compilation on Scala 2.10.Marcelo Vanzin2016-09-011-1/+3
| | | | | | | | | No idea why it was failing (the needed import was there), but this makes things work. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #14925 from vanzin/SPARK-16533.
* [SPARK-16533][CORE] resolve deadlocking in driver when executors dieAngus Gerry2016-09-012-5/+13
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This pull request reverts the changes made as a part of #14605, which simply side-steps the deadlock issue. Instead, I propose the following approach: * Use `scheduleWithFixedDelay` when calling `ExecutorAllocationManager.schedule` for scheduling executor requests. The intent of this is that if invocations are delayed beyond the default schedule interval on account of lock contention, then we avoid a situation where calls to `schedule` are made back-to-back, potentially releasing and then immediately reacquiring these locks - further exacerbating contention. * Replace a number of calls to `askWithRetry` with `ask` inside of message handling code in `CoarseGrainedSchedulerBackend` and its ilk. This allows us queue messages with the relevant endpoints, release whatever locks we might be holding, and then block whilst awaiting the response. This change is made at the cost of being able to retry should sending the message fail, as retrying outside of the lock could easily cause race conditions if other conflicting messages have been sent whilst awaiting a response. I believe this to be the lesser of two evils, as in many cases these RPC calls are to process local components, and so failures are more likely to be deterministic, and timeouts are more likely to be caused by lock contention. ## How was this patch tested? Existing tests, and manual tests under yarn-client mode. Author: Angus Gerry <angolon@gmail.com> Closes #14710 from angolon/SPARK-16533.
* [SPARK-17316][TESTS] Fix MesosCoarseGrainedSchedulerBackendSuiteShixiong Zhu2016-08-311-0/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? The master is broken because #14882 didn't run mesos tests. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #14902 from zsxwing/hotfix.
* [SPARK-16967] move mesos to moduleMichael Gummelt2016-08-2627-0/+5525
## What changes were proposed in this pull request? Move Mesos code into a mvn module ## How was this patch tested? unit tests manually submitting a client mode and cluster mode job spark/mesos integration test suite Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #14637 from mgummelt/mesos-module.