aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-13803] restore the changes in SPARK-3411CodingCat2016-03-151-4/+17
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch contains the functionality to balance the load of the cluster-mode drivers among workers This patch restores the changes in https://github.com/apache/spark/pull/1106 which was erased due to the merging of https://github.com/apache/spark/pull/731 ## How was this patch tested? test with existing test cases Author: CodingCat <zhunansjtu@gmail.com> Closes #11702 from CodingCat/SPARK-13803.
* [SPARK-13626][CORE] Avoid duplicate config deprecation warnings.Marcelo Vanzin2016-03-144-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | Three different things were needed to get rid of spurious warnings: - silence deprecation warnings when cloning configuration - change the way SparkHadoopUtil instantiates SparkConf to silence warnings - avoid creating new SparkConf instances where it's not needed. On top of that, I changed the way that Logging.scala detects the repl; now it uses a method that is overridden in the repl's Main class, and the hack in Utils.scala is not needed anymore. This makes the 2.11 repl behave like the 2.10 one and set the default log level to WARN, which is a lot better. Previously, this wasn't working because the 2.11 repl triggers log initialization earlier than the 2.10 one. I also removed and simplified some other code in the 2.11 repl's Main to avoid replicating logic that already exists elsewhere in Spark. Tested the 2.11 repl in local and yarn modes. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11510 from vanzin/SPARK-13626.
* [SPARK-10907][SPARK-6157] Remove pendingUnrollMemory from MemoryStoreJosh Rosen2016-03-143-238/+224
| | | | | | | | | | | | | | | | | This patch refactors the MemoryStore to remove the concept of `pendingUnrollMemory`. It also fixes fixes SPARK-6157: "Unrolling with MEMORY_AND_DISK should always release memory". Key changes: - Inline `MemoryStore.tryToPut` at its three call sites in the `MemoryStore`. - Inline `Memory.unrollSafely` at its only call site (in `MemoryStore.putIterator`). - Inline `MemoryManager.acquireStorageMemory` at its call sites. - Simplify the code as a result of this inlining (some parameters have fixed values after inlining, so lots of branches can be removed). - Remove the `pendingUnrollMemory` map by returning the amount of unrollMemory allocated when returning an iterator after a failed `putIterator` call. - Change `putIterator` to return an instance of `PartiallyUnrolledIterator`, a special iterator subclass which will automatically free the unroll memory of its partially-unrolled elements when the iterator is consumed. To handle cases where the iterator is not consumed (e.g. when a MEMORY_ONLY put fails), `PartiallyUnrolledIterator` exposes a `close()` method which may be called to discard the unrolled values and free their memory. Author: Josh Rosen <joshrosen@databricks.com> Closes #11613 from JoshRosen/cleanup-unroll-memory.
* [SPARK-13054] Always post TaskEnd event for tasksThomas Graves2016-03-142-9/+65
| | | | | | | | | | | | I am using dynamic container allocation and speculation and am seeing issues with the active task accounting. The Executor UI still shows active tasks on the an executor but the job/stage is all completed. I think its also affecting the dynamic allocation being able to release containers because it thinks there are still tasks. There are multiple issues with this: - If the task end for tasks (in this case probably because of speculation) comes in after the stage is finished, then the DAGScheduler.handleTaskCompletion will skip the task completion event Author: Thomas Graves <tgraves@prevailsail.corp.gq1.yahoo.com> Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes #10951 from tgravescs/SPARK-11701.
* [SPARK-12583][MESOS] Mesos shuffle service: Don't delete shuffle files ↵Bertrand Bossy2016-03-143-35/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | before application has stopped ## Problem description: Mesos shuffle service is completely unusable since Spark 1.6.0 . The problem seems to occur since the move from akka to netty in the networking layer. Until now, a connection from the driver to each shuffle service was used as a signal for the shuffle service to determine, whether the driver is still running. Since 1.6.0, this connection is closed after spark.shuffle.io.connectionTimeout (or spark.network.timeout if the former is not set) due to it being idle. The shuffle service interprets this as a signal that the driver has stopped, despite the driver still being alive. Thus, shuffle files are deleted before the application has stopped. ### Context and analysis: spark shuffle fails with mesos after 2mins: https://issues.apache.org/jira/browse/SPARK-12583 External shuffle service broken w/ Mesos: https://issues.apache.org/jira/browse/SPARK-13159 This is a follow up on #11207 . ## What changes were proposed in this pull request? This PR adds a heartbeat signal from the Driver (in MesosExternalShuffleClient) to all registered external mesos shuffle service instances. In MesosExternalShuffleBlockHandler, a thread periodically checks whether a driver has timed out and cleans an application's shuffle files if this is the case. ## How was the this patch tested? This patch has been tested on a small mesos test cluster using the spark-shell. Log output from mesos shuffle service: ``` 16/02/19 15:13:45 INFO mesos.MesosExternalShuffleBlockHandler: Received registration request from app 294def07-3249-4e0f-8d71-bf8c83c58a50-0018 (remote address /xxx.xxx.xxx.xxx:52391, heartbeat timeout 120000 ms). 16/02/19 15:13:47 INFO shuffle.ExternalShuffleBlockResolver: Registered executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=3} with ExecutorShuffleInfo{localDirs=[/foo/blockmgr-c84c0697-a3f9-4f61-9c64-4d3ee227c047], subDirsPerLocalDir=64, shuffleManager=sort} 16/02/19 15:13:47 INFO shuffle.ExternalShuffleBlockResolver: Registered executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=7} with ExecutorShuffleInfo{localDirs=[/foo/blockmgr-bf46497a-de80-47b9-88f9-563123b59e03], subDirsPerLocalDir=64, shuffleManager=sort} 16/02/19 15:16:02 INFO mesos.MesosExternalShuffleBlockHandler: Application 294def07-3249-4e0f-8d71-bf8c83c58a50-0018 timed out. Removing shuffle files. 16/02/19 15:16:02 INFO shuffle.ExternalShuffleBlockResolver: Application 294def07-3249-4e0f-8d71-bf8c83c58a50-0018 removed, cleanupLocalDirs = true 16/02/19 15:16:02 INFO shuffle.ExternalShuffleBlockResolver: Cleaning up executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=3}'s 1 local dirs 16/02/19 15:16:02 INFO shuffle.ExternalShuffleBlockResolver: Cleaning up executor AppExecId{appId=294def07-3249-4e0f-8d71-bf8c83c58a50-0018, execId=7}'s 1 local dirs ``` Note: there are 2 executors running on this slave. Author: Bertrand Bossy <bertrand.bossy@teralytics.net> Closes #11272 from bbossy/SPARK-12583-mesos-shuffle-service-heartbeat.
* [SPARK-13848][SPARK-5185] Update to Py4J 0.9.2 in order to fix classloading ↵Josh Rosen2016-03-142-2/+2
| | | | | | | | | | | | | | | | issue This patch upgrades Py4J from 0.9.1 to 0.9.2 in order to include a patch which modifies Py4J to use the current thread's ContextClassLoader when performing reflection / class loading. This is necessary in order to fix [SPARK-5185](https://issues.apache.org/jira/browse/SPARK-5185), a longstanding issue affecting the use of `--jars` and `--packages` in PySpark. In order to demonstrate that the fix works, I removed the workarounds which were added as part of [SPARK-6027](https://issues.apache.org/jira/browse/SPARK-6027) / #4779 and other patches. Py4J diff: https://github.com/bartdag/py4j/compare/0.9.1...0.9.2 /cc zsxwing tdas davies brkyvz Author: Josh Rosen <joshrosen@databricks.com> Closes #11687 from JoshRosen/py4j-0.9.2.
* [SPARK-13833] Guard against race condition when re-caching disk blocks in memoryJosh Rosen2016-03-141-53/+84
| | | | | | | | | | | | When reading data from the DiskStore and attempting to cache it back into the memory store, we should guard against race conditions where multiple readers are attempting to re-cache the same block in memory. This patch accomplishes this by synchronizing on the block's `BlockInfo` object while trying to re-cache a block. (Will file JIRA as soon as ASF JIRA stops being down / laggy). Author: Josh Rosen <joshrosen@databricks.com> Closes #11660 from JoshRosen/concurrent-recaching-fixes.
* [SPARK-13746][TESTS] stop using deprecated SynchronizedSetWilson Wu2016-03-141-16/+25
| | | | | | | | trait SynchronizedSet in package mutable is deprecated Author: Wilson Wu <wilson888888888@gmail.com> Closes #11580 from wilson888888888/spark-synchronizedset.
* [MINOR][DOCS] Fix more typos in comments/strings.Dongjoon Hyun2016-03-1426-36/+36
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.
* [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> ↵Sean Owen2016-03-1325-68/+81
| | | | | | | | | | | | | | | | | | | | byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.
* [SPARK-13810][CORE] Add Port Configuration Suggestions on Bind ExceptionsBjorn Jonsson2016-03-131-2/+4
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, when a java.net.BindException is thrown, it displays the following message: java.net.BindException: Address already in use: Service '$serviceName' failed after 16 retries! This change adds port configuration suggestions to the BindException, for example, for the UI, it now displays java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. ## How was this patch tested? Manual tests Author: Bjorn Jonsson <bjornjon@gmail.com> Closes #11644 from bjornjon/master.
* [SPARK-13830] prefer block manager than direct result for large resultDavies Liu2016-03-111-3/+6
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? The current RPC can't handle large blocks very well, it's very slow to fetch 100M block (about 1 minute). Once switch to block manager to fetch that, it took about 10 seconds (still could be improved). ## How was this patch tested? existing unit tests. Author: Davies Liu <davies@databricks.com> Closes #11659 from davies/direct_result.
* [SPARK-13328][CORE] Poor read performance for broadcast variables with ↵Nezih Yigitbasi2016-03-112-15/+116
| | | | | | | | | | dynamic resource allocation When dynamic resource allocation is enabled fetching broadcast variables from removed executors were causing job failures and SPARK-9591 fixed this problem by trying all locations of a block before giving up. However, the locations of a block is retrieved only once from the driver in this process and the locations in this list can be stale due to dynamic resource allocation. This situation gets worse when running on a large cluster as the size of this location list can be in the order of several hundreds out of which there may be tens of stale entries. What we have observed is with the default settings of 3 max retries and 5s between retries (that's 15s per location) the time it takes to read a broadcast variable can be as high as ~17m (70 failed attempts * 15s/attempt) Author: Nezih Yigitbasi <nyigitbasi@netflix.com> Closes #11241 from nezihyigitbasi/SPARK-13328.
* [SPARK-13577][YARN] Allow Spark jar to be multiple jars, archive.Marcelo Vanzin2016-03-112-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | In preparation for the demise of assemblies, this change allows the YARN backend to use multiple jars and globs as the "Spark jar". The config option has been renamed to "spark.yarn.jars" to reflect that. A second option "spark.yarn.archive" was also added; if set, this takes precedence and uploads an archive expected to contain the jar files with the Spark code and its dependencies. Existing deployments should keep working, mostly. This change drops support for the "SPARK_JAR" environment variable, and also does not fall back to using "jarOfClass" if no configuration is set, falling back to finding files under SPARK_HOME instead. This should be fine since "jarOfClass" probably wouldn't work unless you were using spark-submit anyway. Tested with the unit tests, and trying the different config options on a YARN cluster. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11500 from vanzin/SPARK-13577.
* [MINOR][CORE] Fix a duplicate "and" in a log message.Marcelo Vanzin2016-03-101-1/+1
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11642 from vanzin/spark-conf-typo.
* [SPARK-13604][CORE] Sync worker's state after registering with masterShixiong Zhu2016-03-104-5/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Here lists all cases that Master cannot talk with Worker for a while and then network is back. 1. Master doesn't know the network issue (not yet timeout) a. Worker doesn't know the network issue (onDisconnected is not called) - Worker keeps sending Heartbeat. Both Worker and Master don't know the network issue. Nothing to do. (Finally, Master will notice the heartbeat timeout if network is not recovered) b. Worker knows the network issue (onDisconnected is called) - Worker stops sending Heartbeat and sends `RegisterWorker` to master. Master will reply `RegisterWorkerFailed("Duplicate worker ID")`. Worker calls "System.exit(1)" (Finally, Master will notice the heartbeat timeout if network is not recovered) (May leak driver processes. See [SPARK-13602](https://issues.apache.org/jira/browse/SPARK-13602)) 2. Worker timeout (Master knows the network issue). In such case, master removes Worker and its executors and drivers. a. Worker doesn't know the network issue (onDisconnected is not called) - Worker keeps sending Heartbeat. - If the network is back, say Master receives Heartbeat, Master sends `ReconnectWorker` to Worker - Worker send `RegisterWorker` to master. - Master accepts `RegisterWorker` but doesn't know executors and drivers in Worker. (may leak executors) b. Worker knows the network issue (onDisconnected is called) - Worker stop sending `Heartbeat`. Worker will send "RegisterWorker" to master. - Master accepts `RegisterWorker` but doesn't know executors and drivers in Worker. (may leak executors) This PR fixes executors and drivers leak in 2.a and 2.b when Worker reregisters with Master. The approach is making Worker send `WorkerLatestState` to sync the state after registering with master successfully. Then Master will ask Worker to kill unknown executors and drivers. Note: Worker cannot just kill executors after registering with master because in the worker, `LaunchExecutor` and `RegisteredWorker` are processed in two threads. If `LaunchExecutor` happens before `RegisteredWorker`, Worker's executor list will contain new executors after Master accepts `RegisterWorker`. We should not kill these executors. So sending the list to Master and let Master tell Worker which executors should be killed. ## How was this patch tested? test("SPARK-13604: Master should ask Worker kill unknown executors and drivers") Author: Shixiong Zhu <shixiong@databricks.com> Closes #11455 from zsxwing/orphan-executors.
* [SPARK-3854][BUILD] Scala style: require spaces before `{`.Dongjoon Hyun2016-03-109-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since the opening curly brace, '{', has many usages as discussed in [SPARK-3854](https://issues.apache.org/jira/browse/SPARK-3854), this PR adds a ScalaStyle rule to prevent '){' pattern for the following majority pattern and fixes the code accordingly. If we enforce this in ScalaStyle from now, it will improve the Scala code quality and reduce review time. ``` // Correct: if (true) { println("Wow!") } // Incorrect: if (true){ println("Wow!") } ``` IntelliJ also shows new warnings based on this. ## How was this patch tested? Pass the Jenkins ScalaStyle test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11637 from dongjoon-hyun/SPARK-3854.
* [SPARK-13696] Remove BlockStore class & simplify interfaces of mem. & disk ↵Josh Rosen2016-03-1016-611/+528
| | | | | | | | | | | | | | stores Today, both the MemoryStore and DiskStore implement a common `BlockStore` API, but I feel that this API is inappropriate because it abstracts away important distinctions between the behavior of these two stores. For instance, the disk store doesn't have a notion of storing deserialized objects, so it's confusing for it to expose object-based APIs like putIterator() and getValues() instead of only exposing binary APIs and pushing the responsibilities of serialization and deserialization to the client. Similarly, the DiskStore put() methods accepted a `StorageLevel` parameter even though the disk store can only store blocks in one form. As part of a larger BlockManager interface cleanup, this patch remove the BlockStore interface and refines the MemoryStore and DiskStore interfaces to reflect more narrow sets of responsibilities for those components. Some of the benefits of this interface cleanup are reflected in simplifications to several unit tests to eliminate now-unnecessary mocking, significant simplification of the BlockManager's `getLocal()` and `doPut()` methods, and a narrower API between the MemoryStore and DiskStore. Author: Josh Rosen <joshrosen@databricks.com> Closes #11534 from JoshRosen/remove-blockstore-interface.
* [SPARK-13727][CORE] SparkConf.contains does not consider deprecated keysbomeng2016-03-102-1/+18
| | | | | | | | | | | | | | | | | | | | | | The contains() method does not return consistently with get() if the key is deprecated. For example, import org.apache.spark.SparkConf val conf = new SparkConf() conf.set("spark.io.compression.lz4.block.size", "12345") # display some deprecated warning message conf.get("spark.io.compression.lz4.block.size") # return 12345 conf.get("spark.io.compression.lz4.blockSize") # return 12345 conf.contains("spark.io.compression.lz4.block.size") # return true conf.contains("spark.io.compression.lz4.blockSize") # return false The fix will make the contains() and get() more consistent. I've added a test case for this. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tests should be sufficient. Author: bomeng <bmeng@us.ibm.com> Closes #11568 from bomeng/SPARK-13727.
* [SPARK-13758][STREAMING][CORE] enhance exception message to avoid misleadingmwws2016-03-101-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a recoverable Spark streaming job with checkpoint enabled, it could be executed correctly at first time, but throw following exception when restarted and recovered from checkpoint. ``` org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063. at org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$sc(RDD.scala:87) at org.apache.spark.rdd.RDD.withScope(RDD.scala:352) at org.apache.spark.rdd.RDD.union(RDD.scala:565) at org.apache.spark.streaming.Repo$$anonfun$createContext$1.apply(Repo.scala:23) at org.apache.spark.streaming.Repo$$anonfun$createContext$1.apply(Repo.scala:19) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627) ``` According to exception, it shows I invoked transformations and actions in other transformations, but I did not. The real reason is that I used external RDD in DStream operation. External RDD data is not stored in checkpoint, so that during recovering, the initial value of _sc in this RDD is assigned to null and hit above exception. But you can find the error message is misleading, it indicates nothing about the real issue Here is the code to reproduce it. ```scala object Repo { def createContext(ip: String, port: Int, checkpointDirectory: String):StreamingContext = { println("Creating new context") val sparkConf = new SparkConf().setAppName("Repo").setMaster("local[2]") val ssc = new StreamingContext(sparkConf, Seconds(2)) ssc.checkpoint(checkpointDirectory) var cached = ssc.sparkContext.parallelize(Seq("apple, banana")) val words = ssc.socketTextStream(ip, port).flatMap(_.split(" ")) words.foreachRDD((rdd: RDD[String]) => { val res = rdd.map(word => (word, word.length)).collect() println("words: " + res.mkString(", ")) cached = cached.union(rdd) cached.checkpoint() println("cached words: " + cached.collect.mkString(", ")) }) ssc } def main(args: Array[String]) { val ip = "localhost" val port = 9999 val dir = "/home/maowei/tmp" val ssc = StreamingContext.getOrCreate(dir, () => { createContext(ip, port, dir) }) ssc.start() ssc.awaitTermination() } } ``` Author: mwws <wei.mao@intel.com> Closes #11595 from mwws/SPARK-MissleadingLog.
* [SPARK-13492][MESOS] Configurable Mesos framework webui URL.Sergiusz Urbaniak2016-03-098-10/+77
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Previously the Mesos framework webui URL was being derived only from the Spark UI address leaving no possibility to configure it. This commit makes it configurable. If unset it falls back to the previous behavior. Motivation: This change is necessary in order to be able to install Spark on DCOS and to be able to give it a custom service link. The configured `webui_url` is configured to point to a reverse proxy in the DCOS environment. ## How was this patch tested? Locally, using unit tests and on DCOS testing and stable revision. Author: Sergiusz Urbaniak <sur@mesosphere.io> Closes #11369 from s-urbaniak/sur-webui-url.
* [SPARK-13775] History page sorted by completed time desc by default.zhuol2016-03-091-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Originally the page is sorted by AppID by default. After tests with users' feedback, we think it might be best to sort by completed time (desc). ## How was this patch tested? Manually test, with screenshot as follows. ![sorted-by-complete-time-desc](https://cloud.githubusercontent.com/assets/11683054/13647686/d6dea924-e5fa-11e5-8fc5-68e039b74b6f.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #11608 from zhuoliu/13775.
* [SPARK-13778][CORE] Set the executor state for a worker when removing itShixiong Zhu2016-03-091-0/+1
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? When a worker is lost, the executors on this worker are also lost. But Master's ApplicationPage still displays their states as running. This patch just sets the executor state to `LOST` when a worker is lost. ## How was this patch tested? manual tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11609 from zsxwing/SPARK-13778.
* [SPARK-13747][SQL] Fix concurrent query with fork-join poolAndrew Or2016-03-091-1/+6
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Fix this use case, which was already fixed in SPARK-10548 in 1.6 but was broken in master due to #9264: ``` (1 to 100).par.foreach { _ => sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() } ``` This threw `IllegalArgumentException` consistently before this patch. For more detail, see the JIRA. ## How was this patch tested? New test in `SQLExecutionSuite`. Author: Andrew Or <andrew@databricks.com> Closes #11586 from andrewor14/fix-concurrent-sql.
* [SPARK-13769][CORE] Update Java Doc in Spark SubmitAhmed Kamal2016-03-091-1/+1
| | | | | | | | | | | JIRA : https://issues.apache.org/jira/browse/SPARK-13769 The java doc here (https://github.com/apache/spark/blob/e97fc7f176f8bf501c9b3afd8410014e3b0e1602/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L51) needs to be updated from "The latter two operations are currently supported only for standalone cluster mode." to "The latter two operations are currently supported only for standalone and mesos cluster modes." Author: Ahmed Kamal <ahmed.kamal@badrit.com> Closes #11600 from AhmedKamal/SPARK-13769.
* [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance ↵Dongjoon Hyun2016-03-097-20/+17
| | | | | | | | | | | | | | | | | | | | | | | | | creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.
* [SPARK-13631][CORE] Thread-safe getLocationsWithLargestOutputsAndy Sloane2016-03-091-23/+29
| | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? If a job is being scheduled in one thread which has a dependency on an RDD currently executing a shuffle in another thread, Spark would throw a NullPointerException. This patch synchronizes access to `mapStatuses` and skips null status entries (which are in-progress shuffle tasks). ## How was this patch tested? Our client code unit test suite, which was reliably reproducing the race condition with 10 threads, shows that this fixes it. I have not found a minimal test case to add to Spark, but I will attempt to do so if desired. The same test case was tripping up on SPARK-4454, which was fixed by making other DAGScheduler code thread-safe. shivaram srowen Author: Andy Sloane <asloane@tetrationanalytics.com> Closes #11505 from a1k0n/SPARK-13631.
* [SPARK-13692][CORE][SQL] Fix trivial Coverity/Checkstyle defectsDongjoon Hyun2016-03-092-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This issue fixes the following potential bugs and Java coding style detected by Coverity and Checkstyle. - Implement both null and type checking in equals functions. - Fix wrong type casting logic in SimpleJavaBean2.equals. - Add `implement Cloneable` to `UTF8String` and `SortedIterator`. - Remove dereferencing before null check in `AbstractBytesToBytesMapSuite`. - Fix coding style: Add '{}' to single `for` statement in mllib examples. - Remove unused imports in `ColumnarBatch` and `JavaKinesisStreamSuite`. - Remove unused fields in `ChunkFetchIntegrationSuite`. - Add `stop()` to prevent resource leak. Please note that the last two checkstyle errors exist on newly added commits after [SPARK-13583](https://issues.apache.org/jira/browse/SPARK-13583). ## How was this patch tested? manual via `./dev/lint-java` and Coverity site. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11530 from dongjoon-hyun/SPARK-13692.
* [SPARK-13695] Don't cache MEMORY_AND_DISK blocks as bytes in memory after spillsJosh Rosen2016-03-083-13/+26
| | | | | | | | | | | | | | | | | When a cached block is spilled to disk and read back in serialized form (i.e. as bytes), the current BlockManager implementation will attempt to re-insert the serialized block into the MemoryStore even if the block's storage level requests deserialized caching. This behavior adds some complexity to the MemoryStore but I don't think it offers many performance benefits and I'd like to remove it in order to simplify a larger refactoring patch. Therefore, this patch changes the behavior so that disk store reads will only cache bytes in the memory store for blocks with serialized storage levels. There are two places where we request serialized bytes from the BlockStore: 1. getLocalBytes(), which is only called when reading local copies of TorrentBroadcast pieces. Broadcast pieces are always cached using a serialized storage level, so this won't lead to a mismatch in serialization forms if spilled bytes read from disk are cached as bytes in the memory store. 2. the non-shuffle-block branch in getBlockData(), which is only called by the NettyBlockRpcServer when responding to requests to read remote blocks. Caching the serialized bytes in memory will only benefit us if those cached bytes are read before they're evicted and the likelihood of that happening seems low since the frequency of remote reads of non-broadcast cached blocks seems very low. Caching these bytes when they have a low probability of being read is bad if it risks the eviction of blocks which are cached in their expected serialized/deserialized forms, since those blocks seem more likely to be read in local computation. Given the argument above, I think this change is unlikely to cause performance regressions. Author: Josh Rosen <joshrosen@databricks.com> Closes #11533 from JoshRosen/remove-memorystore-level-mismatch.
* [SPARK-13675][UI] Fix wrong historyserver url link for application running ↵jerryshao2016-03-082-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | in yarn cluster mode ## What changes were proposed in this pull request? Current URL for each application to access history UI is like: http://localhost:18080/history/application_1457058760338_0016/1/jobs/ or http://localhost:18080/history/application_1457058760338_0016/2/jobs/ Here **1** or **2** represents the number of attempts in `historypage.js`, but it will parse to attempt id in `HistoryServer`, while the correct attempt id should be like "appattempt_1457058760338_0016_000002", so it will fail to parse to a correct attempt id in HistoryServer. This is OK in yarn client mode, since we don't need this attempt id to fetch out the app cache, but it is failed in yarn cluster mode, where attempt id "1" or "2" is actually wrong. So here we should fix this url to parse the correct application id and attempt id. Also the suffix "jobs/" is not needed. Here is the screenshot: ![screen shot 2016-02-29 at 3 57 32 pm](https://cloud.githubusercontent.com/assets/850797/13524377/d4b44348-e235-11e5-8b3e-bc06de306e87.png) ## How was this patch tested? This patch is tested manually, with different master and deploy mode. ![image](https://cloud.githubusercontent.com/assets/850797/13524419/118be5a0-e236-11e5-8022-3ff613ccde46.png) Author: jerryshao <sshao@hortonworks.com> Closes #11518 from jerryshao/SPARK-13675.
* [SPARK-13117][WEB UI] WebUI should use the local ip not 0.0.0.0Devaraj K2016-03-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? In WebUI, now Jetty Server starts with SPARK_LOCAL_IP config value if it is configured otherwise it starts with default value as '0.0.0.0'. It is continuation as per the closed PR https://github.com/apache/spark/pull/11133 for the JIRA SPARK-13117 and discussion in SPARK-13117. ## How was this patch tested? This has been verified using the command 'netstat -tnlp | grep <PID>' to check on which IP/hostname is binding with the below steps. In the below results, mentioned PID in the command is the corresponding process id. #### Without the patch changes, Web UI(Jetty Server) is not taking the value configured for SPARK_LOCAL_IP and it is listening to all the interfaces. ###### Master ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 3930 tcp6 0 0 :::8080 :::* LISTEN 3930/java ``` ###### Worker ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 4090 tcp6 0 0 :::8081 :::* LISTEN 4090/java ``` ###### History Server Process, ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 2471 tcp6 0 0 :::18080 :::* LISTEN 2471/java ``` ###### Driver ``` [devarajstobdtserver2 spark-master]$ netstat -tnlp | grep 6556 tcp6 0 0 :::4040 :::* LISTEN 6556/java ``` #### With the patch changes ##### i. With SPARK_LOCAL_IP configured If the SPARK_LOCAL_IP is configured then all the processes Web UI(Jetty Server) is getting bind to the configured value. ###### Master ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 1561 tcp6 0 0 x.x.x.x:8080 :::* LISTEN 1561/java ``` ###### Worker ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 2229 tcp6 0 0 x.x.x.x:8081 :::* LISTEN 2229/java ``` ###### History Server ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 3747 tcp6 0 0 x.x.x.x:18080 :::* LISTEN 3747/java ``` ###### Driver ``` [devarajstobdtserver2 spark-master]$ netstat -tnlp | grep 6013 tcp6 0 0 x.x.x.x:4040 :::* LISTEN 6013/java ``` ##### ii. Without SPARK_LOCAL_IP configured If the SPARK_LOCAL_IP is not configured then all the processes Web UI(Jetty Server) will start with the '0.0.0.0' as default value. ###### Master ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 4573 tcp6 0 0 :::8080 :::* LISTEN 4573/java ``` ###### Worker ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 4703 tcp6 0 0 :::8081 :::* LISTEN 4703/java ``` ###### History Server ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 4846 tcp6 0 0 :::18080 :::* LISTEN 4846/java ``` ###### Driver ``` [devarajstobdtserver2 sbin]$ netstat -tnlp | grep 5437 tcp6 0 0 :::4040 :::* LISTEN 5437/java ``` Author: Devaraj K <devaraj@apache.org> Closes #11490 from devaraj-kavali/SPARK-13117-v1.
* [SPARK-13659] Refactor BlockStore put*() APIs to remove returnValuesJosh Rosen2016-03-076-273/+223
| | | | | | | | | | | | | | In preparation for larger refactoring, this patch removes the confusing `returnValues` option from the BlockStore put() APIs: returning the value is only useful in one place (caching) and in other situations, such as block replication, it's simpler to put() and then get(). As part of this change, I needed to refactor `BlockManager.doPut()`'s block replication code. I also changed `doPut()` to access the memory and disk stores directly rather than calling them through the BlockStore interface; this is in anticipation of a followup patch to remove the BlockStore interface so that the disk store can expose a binary-data-oriented API which is not concerned with Java objects or serialization. These changes should be covered by the existing storage unit tests. The best way to review this patch is probably to look at the individual commits, all of which are small and have useful descriptions to guide the review. /cc davies for review. Author: Josh Rosen <joshrosen@databricks.com> Closes #11502 from JoshRosen/remove-returnvalues.
* [SPARK-13711][CORE] Don't call SparkUncaughtExceptionHandler in AppClient as ↵Shixiong Zhu2016-03-071-10/+8
| | | | | | | | | | | | | | | | it's in driver ## What changes were proposed in this pull request? AppClient runs in the driver side. It should not call `Utils.tryOrExit` as it will send exception to SparkUncaughtExceptionHandler and call `System.exit`. This PR just removed `Utils.tryOrExit`. ## How was this patch tested? manual tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11566 from zsxwing/SPARK-13711.
* [SPARK-13665][SQL] Separate the concerns of HadoopFsRelationMichael Armbrust2016-03-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `HadoopFsRelation` is used for reading most files into Spark SQL. However today this class mixes the concerns of file management, schema reconciliation, scan building, bucketing, partitioning, and writing data. As a result, many data sources are forced to reimplement the same functionality and the various layers have accumulated a fair bit of inefficiency. This PR is a first cut at separating this into several components / interfaces that are each described below. Additionally, all implementations inside of Spark (parquet, csv, json, text, orc, svmlib) have been ported to the new API `FileFormat`. External libraries, such as spark-avro will also need to be ported to work with Spark 2.0. ### HadoopFsRelation A simple `case class` that acts as a container for all of the metadata required to read from a datasource. All discovery, resolution and merging logic for schemas and partitions has been removed. This an internal representation that no longer needs to be exposed to developers. ```scala case class HadoopFsRelation( sqlContext: SQLContext, location: FileCatalog, partitionSchema: StructType, dataSchema: StructType, bucketSpec: Option[BucketSpec], fileFormat: FileFormat, options: Map[String, String]) extends BaseRelation ``` ### FileFormat The primary interface that will be implemented by each different format including external libraries. Implementors are responsible for reading a given format and converting it into `InternalRow` as well as writing out an `InternalRow`. A format can optionally return a schema that is inferred from a set of files. ```scala trait FileFormat { def inferSchema( sqlContext: SQLContext, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] def prepareWrite( sqlContext: SQLContext, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory def buildInternalScan( sqlContext: SQLContext, dataSchema: StructType, requiredColumns: Array[String], filters: Array[Filter], bucketSet: Option[BitSet], inputFiles: Array[FileStatus], broadcastedConf: Broadcast[SerializableConfiguration], options: Map[String, String]): RDD[InternalRow] } ``` The current interface is based on what was required to get all the tests passing again, but still mixes a couple of concerns (i.e. `bucketSet` is passed down to the scan instead of being resolved by the planner). Additionally, scans are still returning `RDD`s instead of iterators for single files. In a future PR, bucketing should be removed from this interface and the scan should be isolated to a single file. ### FileCatalog This interface is used to list the files that make up a given relation, as well as handle directory based partitioning. ```scala trait FileCatalog { def paths: Seq[Path] def partitionSpec(schema: Option[StructType]): PartitionSpec def allFiles(): Seq[FileStatus] def getStatus(path: Path): Array[FileStatus] def refresh(): Unit } ``` Currently there are two implementations: - `HDFSFileCatalog` - based on code from the old `HadoopFsRelation`. Infers partitioning by recursive listing and caches this data for performance - `HiveFileCatalog` - based on the above, but it uses the partition spec from the Hive Metastore. ### ResolvedDataSource Produces a logical plan given the following description of a Data Source (which can come from DataFrameReader or a metastore): - `paths: Seq[String] = Nil` - `userSpecifiedSchema: Option[StructType] = None` - `partitionColumns: Array[String] = Array.empty` - `bucketSpec: Option[BucketSpec] = None` - `provider: String` - `options: Map[String, String]` This class is responsible for deciding which of the Data Source APIs a given provider is using (including the non-file based ones). All reconciliation of partitions, buckets, schema from metastores or inference is done here. ### DataSourceAnalysis / DataSourceStrategy Responsible for analyzing and planning reading/writing of data using any of the Data Source APIs, including: - pruning the files from partitions that will be read based on filters. - appending partition columns* - applying additional filters when a data source can not evaluate them internally. - constructing an RDD that is bucketed correctly when required* - sanity checking schema match-up and other analysis when writing. *In the future we should do that following: - Break out file handling into its own Strategy as its sufficiently complex / isolated. - Push the appending of partition columns down in to `FileFormat` to avoid an extra copy / unvectorization. - Use a custom RDD for scans instead of `SQLNewNewHadoopRDD2` Author: Michael Armbrust <michael@databricks.com> Author: Wenchen Fan <wenchen@databricks.com> Closes #11509 from marmbrus/fileDataSource.
* [SPARK-529][CORE][YARN] Add type-safe config keys to SparkConf.Marcelo Vanzin2016-03-075-1/+564
| | | | | | | | | | | | | | | | This is, in a way, the basics to enable SPARK-529 (which was closed as won't fix but I think is still valuable). In fact, Spark SQL created something for that, and this change basically factors out that code and inserts it into SparkConf, with some extra bells and whistles. To showcase the usage of this pattern, I modified the YARN backend to use the new config keys (defined in the new `config` package object under `o.a.s.deploy.yarn`). Most of the changes are mechanic, although logic had to be slightly modified in a handful of places. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10205 from vanzin/conf-opts.
* [SPARK-13459][WEB UI] Separate Alive and Dead Executors in Executor Totals TableAlex Bozarth2016-03-041-39/+45
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Now that dead executors are shown in the executors table (#10058) the totals table is updated to include the separate totals for alive and dead executors as well as the current total, as originally discussed in #10668 ## How was this patch tested? Manually verified by running the Standalone Web UI in the latest Safari and Firefox ESR Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11381 from ajbozarth/spark13459.
* [SPARK-13398][STREAMING] Move away from thread pool task support to forkjoinHolden Karau2016-03-041-0/+18
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Remove old deprecated ThreadPoolExecutor and replace with ExecutionContext using a ForkJoinPool. The downside of this is that scala's ForkJoinPool doesn't give us a way to specify the thread pool name (and is a wrapper of Java's in 2.12) except by providing a custom factory. Note that we can't use Java's ForkJoinPool directly in Scala 2.11 since it uses a ExecutionContext which reports system parallelism. One other implicit change that happens is the old ExecutionContext would have reported a different default parallelism since it used system parallelism rather than threadpool parallelism (this was likely not intended but also likely not a huge difference). The previous version of this PR attempted to use an execution context constructed on the ThreadPool (but not the deprecated ThreadPoolExecutor class) so as to keep the ability to have human readable named threads but this reported system parallelism. ## How was this patch tested? unit tests: streaming/testOnly org.apache.spark.streaming.util.* Author: Holden Karau <holden@us.ibm.com> Closes #11423 from holdenk/SPARK-13398-move-away-from-ThreadPoolTaskSupport-java-forkjoin.
* [MINOR] Fix typos in comments and testcase name of codeDongjoon Hyun2016-03-032-3/+3
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes typos in comments and testcase name of code. ## How was this patch tested? manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
* [SPARK-13423][HOTFIX] Static analysis fixes for 2.x / fixed for Scala 2.10, ↵Sean Owen2016-03-031-1/+1
| | | | | | | | | | | | | | | | again ## What changes were proposed in this pull request? Fixes (another) compile problem due to inadvertent use of Option.contains, only in Scala 2.11 ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11496 from srowen/SPARK-13423.3.
* [SPARK-13423][HOTFIX] Static analysis fixes for 2.x / fixed for Scala 2.10Sean Owen2016-03-032-2/+2
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Fixes compile problem due to inadvertent use of `Option.contains`, only in Scala 2.11. The change should have been to replace `Option.exists(_ == x)` with `== Some(x)`. Replacing exists with contains only makes sense for collections. Replacing use of `Option.exists` still makes sense though as it's misleading. ## How was this patch tested? Jenkins tests / compilation (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Sean Owen <sowen@cloudera.com> Closes #11493 from srowen/SPARK-13423.2.
* [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle ruleDongjoon Hyun2016-03-0352-77/+26
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
* [SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.xSean Owen2016-03-0347-98/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly: - Inner class should be static - Mismatched hashCode/equals - Overflow in compareTo - Unchecked warnings - Misuse of assert, vs junit.assert - get(a) + getOrElse(b) -> getOrElse(a,b) - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions - Dead code - tailrec - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count - reduce(_+_) -> sum map + flatten -> map The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places. ## How was the this patch tested? Existing Jenkins unit tests. Author: Sean Owen <sowen@cloudera.com> Closes #11292 from srowen/SPARK-13423.
* [SPARK-13621][CORE] TestExecutor.scala needs to be moved to test packageDevaraj K2016-03-022-59/+0
| | | | | | | | Moved TestExecutor.scala from src to test package and removed the unused file TestClient.scala. Author: Devaraj K <devaraj@apache.org> Closes #11474 from devaraj-kavali/SPARK-13621.
* [SPARK-13601] call failure callbacks before writer.close()Davies Liu2016-03-024-6/+138
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? In order to tell OutputStream that the task has failed or not, we should call the failure callbacks BEFORE calling writer.close(). ## How was this patch tested? Added new unit tests. Author: Davies Liu <davies@databricks.com> Closes #11450 from davies/callback.
* [SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManagerJosh Rosen2016-03-0215-593/+365
| | | | | | | | | | | | | | CacheManager directly calls MemoryStore.unrollSafely() and has its own logic for handling graceful fallback to disk when cached data does not fit in memory. However, this logic also exists inside of the MemoryStore itself, so this appears to be unnecessary duplication. Thanks to the addition of block-level read/write locks in #10705, we can refactor the code to remove the CacheManager and replace it with an atomic `BlockManager.getOrElseUpdate()` method. This pull request replaces / subsumes #10748. /cc andrewor14 and nongli for review. Note that this changes the locking semantics of a couple of internal BlockManager methods (`doPut()` and `lockNewBlockForWriting`), so please pay attention to the Scaladoc changes and new test cases for those methods. Author: Josh Rosen <joshrosen@databricks.com> Closes #11436 from JoshRosen/remove-cachemanager.
* [SPARK-13478][YARN] Use real user when fetching delegation tokens.Marcelo Vanzin2016-02-291-0/+5
| | | | | | | | | | | | | | | | | The Hive client library is not smart enough to notice that the current user is a proxy user; so when using a proxy user, it fails to fetch delegation tokens from the metastore because of a missing kerberos TGT for the current user. To fix it, just run the code that fetches the delegation token as the real logged in user. Tested on a kerberos cluster both submitting normally and with a proxy user; Hive and HBase tokens are retrieved correctly in both cases. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11358 from vanzin/SPARK-13478.
* [SPARK-13522][CORE] Fix the exit log place for heartbeatShixiong Zhu2016-02-291-1/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Just fixed the log place introduced by #11401 ## How was this patch tested? unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11432 from zsxwing/SPARK-13522-follow-up.
* [SPARK-13522][CORE] Executor should kill itself when it's unable to ↵Shixiong Zhu2016-02-292-1/+29
| | | | | | | | | | | | | | | | | | heartbeat to driver more than N times ## What changes were proposed in this pull request? Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit. This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times. ## How was this patch tested? unit tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11401 from zsxwing/SPARK-13522.
* [SPARK-13481] Desc order of appID by default for history server page.zhuol2016-02-291-1/+2
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Now by default, it shows as ascending order of appId. We might prefer to display as descending order by default, which will show the latest application at the top. ## How was this patch tested? Manual tested. See screenshot below: ![desc-sort](https://cloud.githubusercontent.com/assets/11683054/13307473/102f4cf8-db31-11e5-8dd5-391edbf32f0d.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #11357 from zhuoliu/13481.
* [SPARK-12994][CORE] It is not necessary to create ExecutorAllocationM…Jeff Zhang2016-02-293-7/+21
| | | | | | | | …anager in local mode Author: Jeff Zhang <zjffdu@apache.org> Closes #10914 from zjffdu/SPARK-12994.