aboutsummaryrefslogtreecommitdiff
path: root/yarn
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-6363][BUILD] Make Scala 2.11 the default Scala versionJosh Rosen2016-01-301-2/+2
| | | | | | | | | | | | This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
* [SPARK-7997][CORE] Remove Akka from Spark Core and StreamingShixiong Zhu2016-01-221-1/+1
| | | | | | | | | | | | - Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
* [SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1Shixiong Zhu2016-01-122-4/+4
| | | | | | | | | | | | - [x] Upgrade Py4J to 0.9.1 - [x] SPARK-12657: Revert SPARK-12617 - [x] SPARK-12658: Revert SPARK-12511 - Still keep the change that only reading checkpoint once. This is a manual change and worth to take a look carefully. https://github.com/zsxwing/spark/commit/bfd4b5c040eb29394c3132af3c670b1a7272457c - [x] Verify no leak any more after reverting our workarounds Author: Shixiong Zhu <shixiong@databricks.com> Closes #10692 from zsxwing/py4j-0.9.1.
* [SPARK-12692][BUILD][YARN] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-111-1/+1
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10686 from sarutak/SPARK-12692-followup-yarn.
* [SPARK-3873][TESTS] Import ordering fixes.Marcelo Vanzin2016-01-055-20/+13
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
* [SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵Shixiong Zhu2015-12-311-7/+5
| | | | | | | | | | | | | | | | | | | | | setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`. * `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.
* [SPARK-3873][YARN] Fix import ordering.Marcelo Vanzin2015-12-3012-26/+23
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10536 from vanzin/SPARK-3873-yarn.
* [SPARK-12311][CORE] Restore previous value of "os.arch" property in test ↵Kazuaki Ishizaki2015-12-245-20/+50
| | | | | | | | | | | | suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch property after each test Since some of tests forced to set the specific value to os.arch property, we need to set the original value. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10289 from kiszk/SPARK-12311.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-191-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* [SPARK-4117][YARN] Spark on Yarn handle AM being told command from RMDevaraj K2015-12-151-1/+8
| | | | | | | | | | | | Spark on Yarn handle AM being told command from RM When RM throws ApplicationAttemptNotFoundException for allocate invocation, making the ApplicationMaster to finish immediately without any retries. Author: Devaraj K <devaraj@apache.org> Closes #10129 from devaraj-kavali/SPARK-4117.
* [SPARK-12241][YARN] Improve failure reporting in Yarn client ↵Steve Loughran2015-12-093-31/+64
| | | | | | | | | | | | | | obtainTokenForHBase() This lines up the HBase token logic with that done for Hive in SPARK-11265: reflection with only CFNE being swallowed. There is a test, one which doesn't try to put HBase on the yarn/test class and really do the reflection (the way the hive introspection does). If people do want that then it could be added with careful POM work +also: cut an incorrect comment from the Hive test case before copying it, and a couple of imports that may have been related to the hive test in the past. Author: Steve Loughran <stevel@hortonworks.com> Closes #10227 from steveloughran/stevel/patches/SPARK-12241-obtainTokenForHBase.
* [SPARK-10582][YARN][CORE] Fix AM failure situation for dynamic allocationjerryshao2015-12-091-0/+23
| | | | | | | | | | | | Because of AM failure, the target executor number between driver and AM will be different, which will lead to unexpected behavior in dynamic allocation. So when AM is re-registered with driver, state in `ExecutorAllocationManager` and `CoarseGrainedSchedulerBacked` should be reset. This issue is originally addressed in #8737 , here re-opened again. Thanks a lot KaiXinXiaoLei for finding this issue. andrewor14 and vanzin would you please help to review this, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9963 from jerryshao/SPARK-10582.
* [SPARK-12142][CORE]Reply false when container allocator is not ready and ↵meiyoula2015-12-041-1/+2
| | | | | | | | | | reset target Using Dynamic Allocation function, when a new AM is starting, and ExecutorAllocationManager send RequestExecutor message to AM. If the container allocator is not ready, the whole app will hang on Author: meiyoula <1039320815@qq.com> Closes #10138 from XuTingjun/patch-1.
* [SPARK-12101][CORE] Fix thread pools that cannot cache tasks in Worker and ↵Shixiong Zhu2015-12-031-10/+4
| | | | | | | | | | AppClient `SynchronousQueue` cannot cache any task. This issue is similar to #9978. It's an easy fix. Just use the fixed `ThreadUtils.newDaemonCachedThreadPool`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10108 from zsxwing/fix-threadpool.
* [SPARK-11314][YARN] add service API and test service for Yarn Cluster schedulersSteve Loughran2015-12-039-32/+651
| | | | | | | | | | | | | | | | This is purely the yarn/src/main and yarn/src/test bits of the YARN ATS integration: the extension model to load and run implementations of `SchedulerExtensionService` in the yarn cluster scheduler process —and to stop them afterwards. There's duplication between the two schedulers, yarn-client and yarn-cluster, at least in terms of setting everything up, because the common superclass, `YarnSchedulerBackend` is in spark-core, and the extension services need the YARN app/attempt IDs. If you look at how the the extension services are loaded, the case class `SchedulerExtensionServiceBinding` is used to pass in config info -currently just the spark context and the yarn IDs, of which one, the attemptID, will be null when running client-side. I'm passing in a case class to ensure that it would be possible in future to add extra arguments to the binding class, yet, as the method signature will not have changed, still be able to load existing services. There's no functional extension service here, just one for testing. The real tests come in the bigger pull requests. At the same time, there's no restriction of this extension service purely to the ATS history publisher. Anything else that wants to listen to the spark context and publish events could use this, and I'd also consider writing one for the YARN-913 registry service, so that the URLs of the web UI would be locatable through that (low priority; would make more sense if integrated with a REST client). There's no minicluster test. Given the test execution overhead of setting up minicluster tests, it'd probably be better to add an extension service into one of the existing tests. Author: Steve Loughran <stevel@hortonworks.com> Closes #9182 from steveloughran/stevel/feature/SPARK-1537-service.
* [SPARK-12046][DOC] Fixes various ScalaDoc/JavaDoc issuesCheng Lian2015-12-011-5/+5
| | | | | | | | This PR backports PR #10039 to master Author: Cheng Lian <lian@databricks.com> Closes #10063 from liancheng/spark-12046.doc-fix.master.
* [SPARK-7173][YARN] Add label expression support for application masterjerryshao2015-11-231-1/+25
| | | | | | | | | | Add label expression support for AM to restrict it runs on the specific set of nodes. I tested it locally and works fine. sryza and vanzin please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9800 from jerryshao/SPARK-7173.
* [SPARK-4134][CORE] Lower severity of some executor loss logs.Marcelo Vanzin2015-11-191-2/+2
| | | | | | | | | Don't log ERROR messages when executors are explicitly killed or when the exit reason is not yet known. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9780 from vanzin/SPARK-11789.
* [SPARK-11771][YARN][TRIVIAL] maximum memory in yarn is controlled by two ↵Holden Karau2015-11-171-1/+2
| | | | | | | | | | params have both in error msg When we exceed the max memory tell users to increase both params instead of just the one. Author: Holden Karau <holden@us.ibm.com> Closes #9758 from holdenk/SPARK-11771-maximum-memory-in-yarn-is-controlled-by-two-params-have-both-in-error-msg.
* [SPARK-11718][YARN][CORE] Fix explicitly killed executor dies silently issuejerryshao2015-11-161-6/+24
| | | | | | | | | | | | | | | | | | Currently if dynamic allocation is enabled, explicitly killing executor will not get response, so the executor metadata is wrong in driver side. Which will make dynamic allocation on Yarn fail to work. The problem is `disableExecutor` returns false for pending killing executors when `onDisconnect` is detected, so no further implementation is done. One solution is to bypass these explicitly killed executors to use `super.onDisconnect` to remove executor. This is simple. Another solution is still querying the loss reason for these explicitly kill executors. Since executor may get killed and informed in the same AM-RM communication, so current way of adding pending loss reason request is not worked (container complete is already processed), here we should store this loss reason for later query. Here this PR chooses solution 2. Please help to review. vanzin I think this part is changed by you previously, would you please help to review? Thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9684 from jerryshao/SPARK-11718.
* [SPARK-11615] Drop @VisibleForTesting annotationtedyu2015-11-101-1/+0
| | | | | | | | | | See http://search-hadoop.com/m/q3RTtjpe8r1iRbTj2 for discussion. Summary: addition of VisibleForTesting annotation resulted in spark-shell malfunctioning. Author: tedyu <yuzhihong@gmail.com> Closes #9585 from tedyu/master.
* [SPARK-11555] spark on yarn spark-class --num-workers doesn't workThomas Graves2015-11-062-3/+6
| | | | | | | | | | I tested the various options with both spark-submit and spark-class of specifying number of executors in both client and cluster mode where it applied. --num-workers, --num-executors, spark.executor.instances, SPARK_EXECUTOR_INSTANCES, default nothing supplied Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #9523 from tgravescs/SPARK-11555.
* [SPARK-10622][CORE][YARN] Differentiate dead from "mostly dead" executors.Marcelo Vanzin2015-11-042-34/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In YARN mode, when preemption is enabled, we may leave executors in a zombie state while we wait to retrieve the reason for which the executor exited. This is so that we don't account for failed tasks that were running on a preempted executor. The issue is that while we wait for this information, the scheduler might decide to schedule tasks on the executor, which will never be able to run them. Other side effects include the block manager still considering the executor available to cache blocks, for example. So, when we know that an executor went down but we don't know why, stop everything related to the executor, except its running tasks. Only when we know the reason for the exit (or give up waiting for it) we do update the running tasks. This is achieved by a new `disableExecutor()` method in the `Schedulable` interface. For managers that do not behave like this (i.e. every one but YARN), the existing `executorLost()` method will behave the same way it did before. On top of that change, a few minor changes that made debugging easier, and fixed some other minor issues: - The cluster-mode AM was printing a misleading log message every time an executor disconnected from the driver (because the akka actor system was shared between driver and AM). - Avoid sending unnecessary requests for an executor's exit reason when we already know it was explicitly disabled / killed. This avoids both multiple requests, and unnecessary requests that would just cause warning messages on the AM (in the explicit kill case). - Tone down a log message about the executor being lost when it exited normally (e.g. preemption) - Wake up the AM monitor thread when requests for executor loss reasons arrive too, so that we can more quickly remove executors from this zombie state. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8887 from vanzin/SPARK-10622.
* [SPARK-10997][CORE] Add "client mode" to netty rpc env.Marcelo Vanzin2015-11-021-4/+2
| | | | | | | | | | | | | | | | | | | | | | | "Client mode" means the RPC env will not listen for incoming connections. This allows certain processes in the Spark stack (such as Executors or tha YARN client-mode AM) to act as pure clients when using the netty-based RPC backend, reducing the number of sockets needed by the app and also the number of open ports. Client connections are also preferred when endpoints that actually have a listening socket are involved; so, for example, if a Worker connects to a Master and the Master needs to send a message to a Worker endpoint, that client connection will be used, even though the Worker is also listening for incoming connections. With this change, the workaround for SPARK-10987 isn't necessary anymore, and is removed. The AM connects to the driver in "client mode", and that connection is used for all driver <-> AM communication, and so the AM is properly notified when the connection goes down. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9210 from vanzin/SPARK-10997.
* [SPARK-9817][YARN] Improve the locality calculation of containers by taking ↵jerryshao2015-11-025-40/+159
| | | | | | | | | | | | pending container requests into consideraion This is a follow-up PR to further improve the locality calculation by considering the pending container's request. Since the locality preferences of tasks may be shifted from time to time, current localities of pending container requests may not fully match the new preferences, this PR improve it by removing outdated, unmatched container requests and replace with new requests. sryza please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #8100 from jerryshao/SPARK-9817.
* [SPARK-11073][CORE][YARN] Remove akka dependency in secret key generation.Marcelo Vanzin2015-11-012-2/+33
| | | | | | | | | | Use standard JDK APIs for that (with a little help from Guava). Most of the changes here are in test code, since there were no tests specific to that part of the code. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9257 from vanzin/SPARK-11073.
* [SPARK-11265][YARN] YarnClient can't get tokens to talk to Hive 1.2.1 in a ↵Steve Loughran2015-10-314-49/+129
| | | | | | | | | | secure cluster This is a fix for SPARK-11265; the introspection code to get Hive delegation tokens failing on Spark 1.5.1+, due to changes in the Hive codebase Author: Steve Loughran <stevel@hortonworks.com> Closes #9232 from steveloughran/stevel/patches/SPARK-11265-hive-tokens.
* [SPARK-11178] Improving naming around task failures.Kay Ousterhout2015-10-271-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so that if an executor dies for a reason that's not caused by one of the tasks running on the executor (e.g., due to pre-emption), Spark doesn't count the failure towards the maximum number of failures for the task. That commit introduced some vague naming that this commit attempts to fix; in particular: (1) The variable "isNormalExit", which was used to refer to cases where the executor died for a reason unrelated to the tasks running on the machine, has been renamed (and reversed) to "exitCausedByApp". The problem with the existing name is that it's not clear (at least to me!) what it means for an exit to be "normal"; the new name is intended to make the purpose of this variable more clear. (2) The variable "shouldEventuallyFailJob" has been renamed to "countTowardsTaskFailures". This variable is used to determine whether a task's failure should be counted towards the maximum number of failures allowed for a task before the associated Stage is aborted. The problem with the existing name is that it can be confused with implying that the task's failure should immediately cause the stage to fail because it is somehow fatal (this is the case for a fetch failure, for example: if a task fails because of a fetch failure, there's no point in retrying, and the whole stage should be failed). Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #9164 from kayousterhout/SPARK-11178.
* [SPARK-11105] [YARN] Distribute log4j.properties to executorsvundela2015-10-201-0/+13
| | | | | | | | | | | Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings. If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties Author: vundela <vsr@cloudera.com> Author: Srinivasa Reddy Vundela <vsr@cloudera.com> Closes #9118 from vundela/master.
* [SPARK-10447][SPARK-3842][PYSPARK] upgrade pyspark to py4j0.9Holden Karau2015-10-202-3/+3
| | | | | | | | | Upgrade to Py4j0.9 Author: Holden Karau <holden@pigscanfly.ca> Author: Holden Karau <holden@us.ibm.com> Closes #8615 from holdenk/SPARK-10447-upgrade-pyspark-to-py4j0.9.
* [SPARK-11120] Allow sane default number of executor failures when ↵Ryan Williams2015-10-192-12/+26
| | | | | | | | | | dynamically allocating in YARN I also added some information to container-failure error msgs about what host they failed on, which would have helped me identify the problem that lead me to this JIRA and PR sooner. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #9147 from ryan-williams/dyn-exec-failures.
* [SPARK-10921][YARN] Completely remove the use of SparkContext.prefer…Jacek Laskowski2015-10-192-3/+0
| | | | | | | | …redNodeLocationData Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8976 from jaceklaskowski/SPARK-10921.
* [SPARK-11000] [YARN] Load `metadata.Hive` class only when ↵huangzhaowei2015-10-171-4/+4
| | | | | | | | `hive.metastore.uris` was set to avoid bootting the database twice Author: huangzhaowei <carlmartinmax@gmail.com> Closes #9026 from SaintBacchus/SPARK-11000.
* [SPARK-11026] [YARN] spark.yarn.user.classpath.first does work for ↵Lianhui Wang2015-10-131-8/+15
| | | | | | | | | | 'spark-submit --jars hdfs://user/foo.jar' when spark.yarn.user.classpath.first=true and using 'spark-submit --jars hdfs://user/foo.jar', it can not put foo.jar to system classpath. so we need to put yarn's linkNames of jars to the system classpath. vanzin tgravescs Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #9045 from lianhuiwang/spark-11026.
* [SPARK-10739] [YARN] Add application attempt window for Spark on Yarnjerryshao2015-10-121-0/+14
| | | | | | | | | | | | | Add application attempt window for Spark on Yarn to ignore old out of window failures, this is useful for long running applications to recover from failures. Author: jerryshao <sshao@hortonworks.com> Closes #8857 from jerryshao/SPARK-10739 and squashes the following commits: 36eabdc [jerryshao] change the doc 7f9b77d [jerryshao] Style change 1c9afd0 [jerryshao] Address the comments caca695 [jerryshao] Add application attempt window for Spark on Yarn
* [SPARK-11023] [YARN] Avoid creating URIs from local paths directly.Marcelo Vanzin2015-10-121-5/+6
| | | | | | | | | | | | | | The issue is that local paths on Windows, when provided with drive letters or backslashes, are not valid URIs. Instead of trying to figure out whether paths are URIs or not, use Utils.resolveURI() which does that for us. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9049 from vanzin/SPARK-11023 and squashes the following commits: 77021f2 [Marcelo Vanzin] [SPARK-11023] [yarn] Avoid creating URIs from local paths directly.
* [SPARK-8673] [LAUNCHER] API and infrastructure for communicating with child ↵Marcelo Vanzin2015-10-096-74/+193
| | | | | | | | | | | | | | | | | | apps. This change adds an API that encapsulates information about an app launched using the library. It also creates a socket-based communication layer for apps that are launched as child processes; the launching application listens for connections from launched apps, and once communication is established, the channel can be used to send updates to the launching app, or to send commands to the child app. The change also includes hooks for local, standalone/client and yarn masters. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7052 from vanzin/SPARK-8673.
* [SPARK-10987] [YARN] Workaround for missing netty rpc disconnection event.Marcelo Vanzin2015-10-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | In YARN client mode, when the AM connects to the driver, it may be the case that the driver never needs to send a message back to the AM (i.e., no dynamic allocation or preemption). This triggers an issue in the netty rpc backend where no disconnection event is sent to endpoints, and the AM never exits after the driver shuts down. The real fix is too complicated, so this is a quick hack to unblock YARN client mode until we can work on the real fix. It forces the driver to send a message to the AM when the AM registers, thus establishing that connection and enabling the disconnection event when the driver goes away. Also, a minor side issue: when the executor is shutting down, it needs to send an "ack" back to the driver when using the netty rpc backend; but that "ack" wasn't being sent because the handler was shutting down the rpc env before returning. So added a change to delay the shutdown a little bit, allowing the ack to be sent back. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9021 from vanzin/SPARK-10987.
* [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.Marcelo Vanzin2015-10-073-0/+8
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8775 from vanzin/SPARK-10300.
* [SPARK-10964] [YARN] Correctly register the AM with the driver.Marcelo Vanzin2015-10-071-1/+3
| | | | | | | | | | The `self` method returns null when called from the constructor; instead, registration should happen in the `onStart` method, at which point the `self` reference has already been initialized. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9005 from vanzin/SPARK-10964.
* [SPARK-10812] [YARN] Fix shutdown of token renewer.Marcelo Vanzin2015-10-071-1/+1
| | | | | | | | | | | | | | | | | | A recent change to fix the referenced bug caused this exception in the `SparkContext.stop()` path: org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode! at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:167) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:182) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:440) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1579) at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1730) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185) at org.apache.spark.SparkContext.stop(SparkContext.scala:1729) Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8996 from vanzin/SPARK-10812.
* [SPARK-10901] [YARN] spark.yarn.user.classpath.first doesn't workThomas Graves2015-10-061-12/+27
| | | | | | | | | | | This should go into 1.5.2 also. The issue is we were no longer adding the __app__.jar to the system classpath. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8959 from tgravescs/SPARK-10901.
* [SPARK-10916] [YARN] Set perm gen size when launching containers on YARN.Marcelo Vanzin2015-10-063-3/+24
| | | | | | | | | | This makes YARN containers behave like all other processes launched by Spark, which launch with a default perm gen size of 256m unless overridden by the user (or not needed by the vm). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8970 from vanzin/SPARK-10916.
* [SPARK-6028] [CORE] Remerge #6457: new RPC implemetation and also pick #8905zsxwing2015-10-031-4/+1
| | | | | | | | This PR just reverted https://github.com/apache/spark/commit/02144d6745ec0a6d8877d969feb82139bd22437f to remerge #6457 and also included the commits in #8905. Author: zsxwing <zsxwing@gmail.com> Closes #8944 from zsxwing/SPARK-6028.
* [SPARK-10871] include number of executor failures in error msgRyan Williams2015-09-291-1/+1
| | | | | | Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #8939 from ryan-williams/errmsg.
* [SPARK-10790] [YARN] Fix initial executor number not set issue and ↵jerryshao2015-09-284-40/+27
| | | | | | | | | | | | consolidate the codes This bug is introduced in [SPARK-9092](https://issues.apache.org/jira/browse/SPARK-9092), `targetExecutorNumber` should use `minExecutors` if `initialExecutors` is not set. Using 0 instead will meet the problem as mentioned in [SPARK-10790](https://issues.apache.org/jira/browse/SPARK-10790). Also consolidate and simplify some similar code snippets to keep the consistent semantics. Author: jerryshao <sshao@hortonworks.com> Closes #8910 from jerryshao/SPARK-10790.
* [SPARK-10812] [YARN] Spark hadoop util support switching to yarnHolden Karau2015-09-282-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | While this is likely not a huge issue for real production systems, for test systems which may setup a Spark Context and tear it down and stand up a Spark Context with a different master (e.g. some local mode & some yarn mode) tests this cane be an issue. Discovered during work on spark-testing-base on Spark 1.4.1, but seems like the logic that triggers it is present in master (see SparkHadoopUtil object). A valid work around for users encountering this issue is to fork a different JVM, however this can be heavy weight. ``` [info] SampleMiniClusterTest: [info] Exception encountered when attempting to run a suite with class name: com.holdenkarau.spark.testing.SampleMiniClusterTest *** ABORTED *** [info] java.lang.ClassCastException: org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil [info] at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:163) [info] at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:257) [info] at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:561) [info] at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:115) [info] at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) [info] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) [info] at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.setup(SharedMiniCluster.scala:186) [info] at com.holdenkarau.spark.testing.SampleMiniClusterTest.setup(SampleMiniClusterTest.scala:26) [info] at com.holdenkarau.spark.testing.SharedMiniCluster$class.beforeAll(SharedMiniCluster.scala:103) ``` Author: Holden Karau <holden@pigscanfly.ca> Closes #8911 from holdenk/SPARK-10812-spark-hadoop-util-support-switching-to-yarn.
* Revert "[SPARK-6028][Core]A new RPC implemetation based on the network module"Xiangrui Meng2015-09-241-1/+4
| | | | This reverts commit 084e4e126211d74a79e8dbd2d0e604dd3c650822.
* [SPARK-6028][Core]A new RPC implemetation based on the network modulezsxwing2015-09-231-4/+1
| | | | | | | | Design doc: https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing Author: zsxwing <zsxwing@gmail.com> Closes #6457 from zsxwing/new-rpc.
* Revert "[SPARK-10300] [BUILD] [TESTS] Add support for test tags in ↵Marcelo Vanzin2015-09-153-28/+0
| | | | | | run-tests.py." This reverts commit 8abef21dac1a6538c4e4e0140323b83d804d602b.