aboutsummaryrefslogtreecommitdiff
path: root/core/src/main
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-4480] Avoid many small spills in external data structures (1.1)Andrew Or2014-11-193-11/+34
| | | | | | | | | | | This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <andrew@databricks.com> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills
* [SPARK-4380] Log more precise number of bytes spilled (1.1)Andrew Or2014-11-182-4/+6
| | | | | | | | | | This is the branch-1.1 version of #3243. Author: Andrew Or <andrew@databricks.com> Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits: 36ec152 [Andrew Or] Log more precise representation of bytes in spilling code
* [SPARK-4433] fix a racing condition in zipWithIndexXiangrui Meng2014-11-181-14/+17
| | | | | | | | | | | | | | | | | | | | | | | Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex (cherry picked from commit bb46046154a438df4db30a0e1fd557bd3399ee7b) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; ↵Kousuke Saruta2014-11-181-13/+39
| | | | | | | | | | | | | | use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits: 786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager
* [SPARK-4467] Partial fix for fetch failure in sort-based shuffle (1.1)Andrew Or2014-11-171-0/+1
| | | | | | | | | | This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR. Author: Andrew Or <andrew@databricks.com> Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits: 486fc49 [Andrew Or] Reset `elementsRead`
* Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file"Andrew Or2014-11-171-10/+1
| | | | This reverts commit 098f83c7ccd7dad9f9228596da69fe5f55711a52.
* Update versions for 1.1.1 releaseAndrew Or2014-11-101-1/+1
|
* [SPARK-3495][SPARK-3496] Backporting block replication fixes made in master ↵Tathagata Das2014-11-105-41/+123
| | | | | | | | | | | | | | | | to branch 1.1 The original PR was #2366 This backport was non-trivial because Spark 1.1 uses ConnectionManager instead of NioBlockTransferService, which required slight modification to unit tests. Other than that the code is exactly same as in the original PR. Please refer to discussion in the original PR if you have any thoughts. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3191 from tdas/replication-fix-branch-1.1-backport and squashes the following commits: 593214a [Tathagata Das] Merge remote-tracking branch 'apache-github/branch-1.1' into branch-1.1 2ed927f [Tathagata Das] Fixed error in unit test. de4ff73 [Tathagata Das] [SPARK-3495] Block replication fails continuously when the replication target node is dead AND [SPARK-3496] Block replication by mistake chooses driver as target
* [SPARK-4169] [Core] Accommodate non-English Locales in unit testsNiklas Wilcke2014-11-101-1/+1
| | | | | | | | | | | | | | | | | | For me the core tests failed because there are two locale dependent parts in the code. Look at the Jira ticket for details. Why is it necessary to check the exception message in isBindCollision in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686 ? Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de> Closes #3036 from numbnut/core-test-fix and squashes the following commits: 1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests (cherry picked from commit ed8bf1eac548577c4bbad7ce3f7f301a2f52ef17) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-4158] Fix for missing resources.Brenden Matthews2014-11-052-4/+2
| | | | | | | | | | | | | | | Mesos offers may not contain all resources, and Spark needs to check to ensure they are present and sufficient. Spark may throw an erroneous exception when resources aren't present. Author: Brenden Matthews <brenden@diddyinc.com> Closes #3024 from brndnmtthws/fix-mesos-resource-misuse and squashes the following commits: e5f9580 [Brenden Matthews] [SPARK-4158] Fix for missing resources. (cherry picked from commit cb0eae3b78d7f6f56c0b9521ee48564a4967d3de) Signed-off-by: Andrew Or <andrew@databricks.com>
* SPARK-3223 runAsSparkUser cannot change HDFS write permission properly i...Jongyoul Lee2014-11-052-2/+2
| | | | | | | | | | | | | | | ...n mesos cluster mode - change master newer Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3034 from jongyoul/SPARK-3223 and squashes the following commits: 42b2ed3 [Jongyoul Lee] SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode - change master newer (cherry picked from commit f7ac8c2b1de96151231617846b7468d23379c74a) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-4097] Fix the race condition of 'thread'zsxwing2014-10-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | There is a chance that `thread` is null when calling `thread.interrupt()`. ```Scala override def cancel(): Unit = this.synchronized { _cancelled = true if (thread != null) { thread.interrupt() } } ``` Should put `thread = null` into a `synchronized` block to fix the race condition. Author: zsxwing <zsxwing@gmail.com> Closes #2957 from zsxwing/SPARK-4097 and squashes the following commits: edf0aee [zsxwing] Add comments to explain the lock c5cfeca [zsxwing] Fix the race condition of 'thread' (cherry picked from commit e7fd80413d531e23b6c4def0ee32e52a39da36fa) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-4107] Fix incorrect handling of read() and skip() return values ↵Josh Rosen2014-10-284-26/+20
| | | | | | | | | | | | | | | | (branch-1.1 backport) `read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. `skip()` faces similar issues, too. This patch fixes several cases where we mis-handle these methods' return values. This is a backport of #2969 to `branch-1.1`. Author: Josh Rosen <joshrosen@databricks.com> Closes #2974 from JoshRosen/spark-4107-branch-1.1-backport and squashes the following commits: d82c05b [Josh Rosen] [SPARK-4107] Fix incorrect handling of read() and skip() return values
* [SPARK-4080] Only throw IOException from [write|read][Object|External]Josh Rosen2014-10-2423-34/+63
| | | | | | | | | | | | | | | | | | | | | | | | | If classes implementing Serializable or Externalizable interfaces throw exceptions other than IOException or ClassNotFoundException from their (de)serialization methods, then this results in an unhelpful "IOException: unexpected exception type" rather than the actual exception that produced the (de)serialization error. This patch fixes this by adding a utility method that re-wraps any uncaught exceptions in IOException (unless they are already instances of IOException). Author: Josh Rosen <joshrosen@databricks.com> Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits: cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External]. (cherry picked from commit 6c98c29ae0033556fd4424f41d1de005c509e511) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala
* [SPARK-4006] In long running contexts, we encountered the situation of d...Tal Sliwowicz2014-10-241-12/+13
| | | | | | | | | | | | | | | | | | | | ...ouble registe... ...r without a remove in between. The cause for that is unknown, and assumed a temp network issue. However, since the second register is with a BlockManagerId on a different port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor returns Some. This inconsistency is caught in a conditional statement that does System.exit(1), which is a huge robustness issue for us. The fix - simply remove the old id from both maps during register when this happens. We are mimicking the behavior of expireDeadHosts(), by doing local cleanup of the maps before trying to add new ones. Also - added some logging for register and unregister. This is just like https://github.com/apache/spark/pull/2886 except it's on branch-1.1 Author: Tal Sliwowicz <tal.s@taboola.com> Closes #2915 from tsliwowicz/branch-1.1-block-mgr-removal and squashes the following commits: d122236 [Tal Sliwowicz] [SPARK-4006] In long running contexts, we encountered the situation of double registe...
* [SPARK-4075] [Deploy] Jar url validation is not enough for Jar fileKousuke Saruta2014-10-241-1/+10
| | | | | | | | | | | | | | | | | In deploy.ClientArguments.isValidJarUrl, the url is checked as follows. def isValidJarUrl(s: String): Boolean = s.matches("(.+):(.+)jar") So, it allows like 'hdfs:file.jar' (no authority). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2925 from sarutak/uri-syntax-check-improvement and squashes the following commits: cf06173 [Kousuke Saruta] Improved URI syntax checking (cherry picked from commit 098f83c7ccd7dad9f9228596da69fe5f55711a52) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-3426] Fix sort-based shuffle error when spark.shuffle.compress and ↵Josh Rosen2014-10-225-11/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spark.shuffle.spill.compress settings are different This PR fixes SPARK-3426, an issue where sort-based shuffle crashes if the `spark.shuffle.spill.compress` and `spark.shuffle.compress` settings have different values. The problem is that sort-based shuffle's read and write paths use different settings for determining whether to apply compression. ExternalSorter writes runs to files using `TempBlockId` ids, which causes `spark.shuffle.spill.compress` to be used for enabling compression, but these spilled files end up being shuffled over the network and read as shuffle files using `ShuffleBlockId` by BlockStoreShuffleFetcher, which causes `spark.shuffle.compress` to be used for enabling decompression. As a result, this leads to errors when these settings disagree. Based on the discussions in #2247 and #2178, it sounds like we don't want to remove the `spark.shuffle.spill.compress` setting. Therefore, I've tried to come up with a fix where `spark.shuffle.spill.compress` is used to compress data that's read and written locally and `spark.shuffle.compress` is used to compress any data that will be fetched / read as shuffle blocks. To do this, I split `TempBlockId` into two new id types, `TempLocalBlockId` and `TempShuffleBlockId`, which map to `spark.shuffle.spill.compress` and `spark.shuffle.compress`, respectively. ExternalAppendOnlyMap also used temp blocks for spilling data. It looks like ExternalSorter was designed to be a generic sorter but its configuration already happens to be tied to sort-based shuffle, so I think it's fine if we use `spark.shuffle.compress` to compress its spills; we can move the compression configuration to the constructor in a later commit if we find that ExternalSorter is being used in other contexts where we want different configuration options to control compression. To summarize: **Before:** | | ExternalAppendOnlyMap | ExternalSorter | |-------|------------------------------|------------------------------| | Read | spark.shuffle.spill.compress | spark.shuffle.compress | | Write | spark.shuffle.spill.compress | spark.shuffle.spill.compress | **After:** | | ExternalAppendOnlyMap | ExternalSorter | |-------|------------------------------|------------------------| | Read | spark.shuffle.spill.compress | spark.shuffle.compress | | Write | spark.shuffle.spill.compress | spark.shuffle.compress | Thanks to andrewor14 for debugging this with me! Author: Josh Rosen <joshrosen@databricks.com> Closes #2890 from JoshRosen/SPARK-3426 and squashes the following commits: 1921cf6 [Josh Rosen] Minor edit for clarity. c8dd8f2 [Josh Rosen] Add comment explaining use of createTempShuffleBlock(). 2c687b9 [Josh Rosen] Fix SPARK-3426. 91e7e40 [Josh Rosen] Combine tests into single test of all combinations 76ca65e [Josh Rosen] Add regression test for SPARK-3426. Conflicts: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
* [SPARK-4010][Web UI]Spark UI returns 500 in yarn-client modeGuoQiang Li2014-10-202-5/+5
| | | | | | | | | | | | | | The problem caused by #1966 CC YanTangZhai andrewor14 Author: GuoQiang Li <witgo@qq.com> Closes #2858 from witgo/SPARK-4010 and squashes the following commits: 9866fbf [GuoQiang Li] Spark UI returns 500 in yarn-client mode (cherry picked from commit 51afde9d8b8a67958c4632a13af143d7c7fd1f04) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3948][Shuffle]Fix stream corruption bug in sort-based shufflejerryshao2014-10-202-5/+27
| | | | | | | | | | | | | | | | | | Kernel 2.6.32 bug will lead to unexpected behavior of transferTo in copyStream, and this will corrupt the shuffle output file in sort-based shuffle, which will somehow introduce PARSING_ERROR(2), deserialization error or offset out of range. Here fix this by adding append flag, also add some position checking code. Details can be seen in [SPARK-3948](https://issues.apache.org/jira/browse/SPARK-3948). Author: jerryshao <saisai.shao@intel.com> Closes #2824 from jerryshao/SPARK-3948 and squashes the following commits: be0533a [jerryshao] Address the comments a82b184 [jerryshao] add configuration to control the NIO way of copying stream e17ada2 [jerryshao] Fix kernel 2.6.32 bug led unexpected behavior of transferTo (cherry picked from commit c7aeecd08fd329085760fa89025ec0d9c04f5e3f) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/util/Utils.scala
* [SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)Josh Rosen2014-10-191-15/+38
| | | | | | | | | | | | | | This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket). This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues. Author: Josh Rosen <joshrosen@apache.org> Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits: f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf. b562451 [Josh Rosen] Remove unused jobConfCacheKey field. dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task.
* SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not SerializableSean Owen2014-10-183-8/+21
| | | | | | | | | | | | | Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are Author: Sean Owen <sowen@cloudera.com> Closes #2805 from srowen/SPARK-3926 and squashes the following commits: ecb78ee [Sean Owen] Fix conflict between java.io.Serializable and use of Scala's Serializable f4717f9 [Sean Owen] Oops, fix compile problem ae1b36f [Sean Owen] Expand to cover Maps returned from other Java API methods as well 51c26c2 [Sean Owen] Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are
* [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 vers...Marcelo Vanzin2014-10-173-12/+17
| | | | | | | | | | | | | | | | | | | | | | | ...ion). This is a backport of SPARK-3606 to branch-1.1. Some of the code had to be duplicated since branch-1.1 doesn't have the cleanup work that was done to the Yarn codebase. I don't know whether the version issue in yarn/alpha/pom.xml was intentional, but I couldn't compile the code without fixing it. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #2497 from vanzin/SPARK-3606-1.1 and squashes the following commits: 4fd3c27 [Marcelo Vanzin] Remove unused imports. 75cde8c [Marcelo Vanzin] Scala is weird. b27ebda [Marcelo Vanzin] Review feedback. 72ceafb [Marcelo Vanzin] Undelete needed import. 61162a6 [Marcelo Vanzin] Use separate config for each param instead of json. 3b7205f [Marcelo Vanzin] Review feedback. b3b3e50 [Marcelo Vanzin] [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 version).
* [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section ↵yantangzhai2014-10-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | sometimes JobProgressPage could not show Fair Scheduler Pools section sometimes. SparkContext starts webui and then postEnvironmentUpdate. Sometimes JobProgressPage is accessed between webui starting and postEnvironmentUpdate, then the lazy val isFairScheduler will be false. The Fair Scheduler Pools section will not display any more. Author: yantangzhai <tyz0303@163.com> Author: YanTangZhai <hakeemzhai@tencent.com> Closes #1966 from YanTangZhai/SPARK-3067 and squashes the following commits: d4323f8 [yantangzhai] update [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes 8a00106 [YanTangZhai] Merge pull request #6 from apache/master b6391cc [yantangzhai] revert [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes d2226cd [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes cbcba66 [YanTangZhai] Merge pull request #3 from apache/master aac7f7b [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes cdef539 [YanTangZhai] Merge pull request #1 from apache/master (cherry picked from commit dedace83f35cba0f833d962acbd75572318948c4) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3905][Web UI]The keys for sorting the columns of Executor page ,Stage ↵GuoQiang Li2014-10-123-12/+12
| | | | | | | | | | | | | page Storage page are incorrect Author: GuoQiang Li <witgo@qq.com> Closes #2763 from witgo/SPARK-3905 and squashes the following commits: 17d7990 [GuoQiang Li] The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect (cherry picked from commit b4a7fa7a663c462bf537ca9d63af0dba6b4a8033) Signed-off-by: Josh Rosen <joshrosen@apache.org>
* [SPARK-3121] Wrong implementation of implicit bytesWritableConverterJakub Dubovský2014-10-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | val path = ... //path to seq file with BytesWritable as type of both key and value val file = sc.sequenceFile[Array[Byte],Array[Byte]](path) file.take(1)(0)._1 This prints incorrect content of byte array. Actual content starts with correct one and some "random" bytes and zeros are appended. BytesWritable has two methods: getBytes() - return content of all internal array which is often longer then actual value stored. It usually contains the rest of previous longer values copyBytes() - return just begining of internal array determined by internal length property It looks like in implicit conversion between BytesWritable and Array[byte] getBytes is used instead of correct copyBytes. dbtsai Author: Jakub Dubovský <james64@inMail.sk> Author: Dubovsky Jakub <dubovsky@avast.com> Closes #2712 from james64/3121-bugfix and squashes the following commits: f85d24c [Jakub Dubovský] Test name changed, comments added 1b20d51 [Jakub Dubovský] Import placed correctly 406e26c [Jakub Dubovský] Scala style fixed f92ffa6 [Dubovsky Jakub] performance tuning 480f9cd [Dubovsky Jakub] Bug 3121 fixed (cherry picked from commit fc616d51a510f82627b5be949a5941419834cf70) Signed-off-by: Josh Rosen <joshrosen@apache.org>
* [SPARK-3844][UI] Truncate appName in WebUI if it is too longXiangrui Meng2014-10-091-1/+5
| | | | | | | | | | | | | | Truncate appName in WebUI if it is too long. Author: Xiangrui Meng <meng@databricks.com> Closes #2707 from mengxr/truncate-app-name and squashes the following commits: 87834ce [Xiangrui Meng] move scala import below java c7111dc [Xiangrui Meng] truncate appName in WebUI if it is too long (cherry picked from commit 86b392942daf61fed2ff7490178b128107a0e856) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3829] Make Spark logo image on the header of HistoryPage as a link to ↵Kousuke Saruta2014-10-071-2/+4
| | | | | | | | | | | | | | | | | | HistoryPage's page #1 There is a Spark logo on the header of HistoryPage. We can have too many HistoryPages if we run 20+ applications. So I think, it's useful if the logo is as a link to the HistoryPage's page number 1. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2690 from sarutak/SPARK-3829 and squashes the following commits: 908c109 [Kousuke Saruta] Removed extra space. 00bfbd7 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3829 dd87480 [Kousuke Saruta] Made header Spark log image as a link to History Server's top page. (cherry picked from commit b69c9fb6fb048509bbd8430fb697dc3a5ca4fe59) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3777] Display "Executor ID" for Tasks in Stage pagezsxwing2014-10-071-2/+2
| | | | | | | | | | | | | | | | | | Now the Stage page only displays "Executor"(host) for tasks. However, there may be more than one Executors running in the same host. Currently, when some task is hung, I only know the host of the faulty executor. Therefore I have to check all executors in the host. Adding "Executor ID" in the Tasks table. would be helpful to locate the faulty executor. Here is the new page: ![add_executor_id_for_tasks](https://cloud.githubusercontent.com/assets/1000778/4505774/acb9648c-4afa-11e4-8826-8768a0a60cc9.png) Author: zsxwing <zsxwing@gmail.com> Closes #2642 from zsxwing/SPARK-3777 and squashes the following commits: 37945af [zsxwing] Put Executor ID and Host into one cell 4bbe2c7 [zsxwing] [SPARK-3777] Display "Executor ID" for Tasks in Stage page (cherry picked from commit 446063eca98ae56d1ac61415f4c6e89699b8db02) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3731] [PySpark] fix memory leak in PythonRDDDavies Liu2014-10-071-0/+4
| | | | | | | | | | | | | | | | The parent.getOrCompute() of PythonRDD is executed in a separated thread, it should release the memory reserved for shuffle and unrolling finally. Author: Davies Liu <davies.liu@gmail.com> Closes #2668 from davies/leak and squashes the following commits: ae98be2 [Davies Liu] fix memory leak in PythonRDD (cherry picked from commit bc87cc410fae59660c13b6ae1c14204df77237b8) Signed-off-by: Josh Rosen <joshrosen@apache.org> Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
* [SPARK-3825] Log more detail when unrolling a block failsAndrew Or2014-10-072-8/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Before: ``` 14/10/06 16:45:42 WARN CacheManager: Not enough space to cache partition rdd_0_2 in memory! Free memory is 481861527 bytes. ``` After: ``` 14/10/07 11:08:24 WARN MemoryStore: Not enough space to cache rdd_2_0 in memory! (computed 68.8 MB so far) 14/10/07 11:08:24 INFO MemoryStore: Memory use = 1088.0 B (blocks) + 445.1 MB (scratch space shared across 8 thread(s)) = 445.1 MB. Storage limit = 459.5 MB. ``` Author: Andrew Or <andrewor14@gmail.com> Closes #2688 from andrewor14/cache-log-message and squashes the following commits: 28e33d6 [Andrew Or] Shy away from "unrolling" 5638c49 [Andrew Or] Grammar 39a0c28 [Andrew Or] Log more detail when unrolling a block fails (cherry picked from commit 553737c6e6d5ffa3b52a9888444f4beece5c5b1a) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3827] Very long RDD names are not rendered properly in web UIHossein2014-10-071-0/+5
| | | | | | | | | | | | | | | | | With Spark SQL we generate very long RDD names. These names are not properly rendered in the web UI. This PR fixes the rendering issue. [SPARK-3827] #comment Linking PR with JIRA Author: Hossein <hossein@databricks.com> Closes #2687 from falaki/sparkTableUI and squashes the following commits: fd06409 [Hossein] Limit width of cell when RDD name is too long (cherry picked from commit d65fd554b4de1dbd8db3090b0e50994010d30e78) Signed-off-by: Josh Rosen <joshrosen@apache.org>
* SPARK-1656: Fix potential resource leakszsxwing2014-10-053-15/+40
| | | | | | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-1656 Author: zsxwing <zsxwing@gmail.com> Closes #577 from zsxwing/SPARK-1656 and squashes the following commits: c431095 [zsxwing] Add a comment and fix the code style 2de96e5 [zsxwing] Make sure file will be deleted if exception happens 28b90dc [zsxwing] Update to follow the code style 4521d6e [zsxwing] Merge branch 'master' into SPARK-1656 afc3383 [zsxwing] Update to follow the code style 071fdd1 [zsxwing] SPARK-1656: Fix potential resource leaks (cherry picked from commit a7c73130f1b6b0b8b19a7b0a0de5c713b673cd7b) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3597][Mesos] Implement `killTask`.Brenden Matthews2014-10-051-0/+7
| | | | | | | | | | | | | | The MesosSchedulerBackend did not previously implement `killTask`, resulting in an exception. Author: Brenden Matthews <brenden@diddyinc.com> Closes #2453 from brndnmtthws/implement-killtask and squashes the following commits: 23ddcdc [Brenden Matthews] [SPARK-3597][Mesos] Implement `killTask`. (cherry picked from commit 32fad4233f353814496c84e15ba64326730b7ae7) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3535][Mesos] Fix resource handling.Brenden Matthews2014-10-033-8/+68
| | | | | | | | | | | Author: Brenden Matthews <brenden@diddyinc.com> Closes #2401 from brndnmtthws/master and squashes the following commits: 4abaa5d [Brenden Matthews] [SPARK-3535][Mesos] Fix resource handling. (cherry picked from commit a8c52d5343e19731909e73db5de151a324d31cd5) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIREugenCepoi2014-10-031-25/+17
| | | | | | | | | | | | | | | | Update of PR #997. With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env). Author: EugenCepoi <cepoi.eugen@gmail.com> Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits: 0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd 77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR (cherry picked from commit f0811f928e5b608e1a2cba3b6828ba0ed03b701d) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [DEPLOY] SPARK-3759: Return the exit code of the driver processEric Eijkelenboom2014-10-021-1/+2
| | | | | | | | | | | | | SparkSubmitDriverBootstrapper.scala now returns the exit code of the driver process, instead of always returning 0. Author: Eric Eijkelenboom <ee@userreport.com> Closes #2628 from ericeijkelenboom/master and squashes the following commits: cc4a571 [Eric Eijkelenboom] Return the exit code of the driver process (cherry picked from commit 42d5077fd3f2c37d1cd23f4c81aa89286a74cb40) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3755][Core] avoid trying privileged port when request a ↵scwf2014-10-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | non-privileged port pwendell, ```tryPort``` is not compatible with old code in last PR, this is to fix it. And after discuss with srowen renamed the title to "avoid trying privileged port when request a non-privileged port". Plz refer to the discuss for detail. Author: scwf <wangfei1@huawei.com> Closes #2623 from scwf/1-1024 and squashes the following commits: 10a4437 [scwf] add comment de3fd17 [scwf] do not try privileged port when request a non-privileged port 42cb0fa [scwf] make tryPort compatible with old code cb8cc76 [scwf] do not use port 1 - 1024 (cherry picked from commit 8081ce8bd111923db143abc55bb6ef9793eece35) Signed-off-by: Andrew Or <andrewor14@gmail.com> Conflicts: core/src/main/scala/org/apache/spark/util/Utils.scala
* [SPARK-3756] [Core]check exception is caused by an address-port collision ↵scwf2014-10-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | properly Jetty server use MultiException to handle exceptions when start server refer https://github.com/eclipse/jetty.project/blob/jetty-8.1.14.v20131031/jetty-server/src/main/java/org/eclipse/jetty/server/Server.java So in ```isBindCollision``` add the logical to cover MultiException Author: scwf <wangfei1@huawei.com> Closes #2611 from scwf/fix-isBindCollision and squashes the following commits: 984cb12 [scwf] optimize the fix 3a6c849 [scwf] fix bug in isBindCollision (cherry picked from commit 2fedb5dddcc10d3186f49fc4996a7bb5b68bbc85) Signed-off-by: Patrick Wendell <pwendell@gmail.com> Conflicts: core/src/main/scala/org/apache/spark/util/Utils.scala
* [SPARK-3755][Core] Do not bind port 1 - 1024 to server in sparkscwf2014-10-011-1/+1
| | | | | | | | | | | | | Non-root user use port 1- 1024 to start jetty server will get the exception " java.net.SocketException: Permission denied", so not use these ports Author: scwf <wangfei1@huawei.com> Closes #2610 from scwf/1-1024 and squashes the following commits: cb8cc76 [scwf] do not use port 1 - 1024 (cherry picked from commit 6390aae4eacbabfb1c53fb828b824f6a6518beff) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3747] TaskResultGetter could incorrectly abort a stage if it cannot ↵Reynold Xin2014-10-011-2/+5
| | | | | | | | | | | | | | get result for a specific task Author: Reynold Xin <rxin@apache.org> Closes #2599 from rxin/SPARK-3747 and squashes the following commits: a74c04d [Reynold Xin] Added a line of comment explaining NonFatal 0e8d44c [Reynold Xin] [SPARK-3747] TaskResultGetter could incorrectly abort a stage if it cannot get result for a specific task (cherry picked from commit eb43043f411b87b7b412ee31e858246bd93fdd04) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-3709] Executors don't always report broadcast block removal properly ↵Reynold Xin2014-09-301-2/+2
| | | | | | | | | | back to the driver (for branch-1.1) Author: Reynold Xin <rxin@apache.org> Closes #2591 from rxin/SPARK-3709-1.1 and squashes the following commits: ab99cc0 [Reynold Xin] [SPARK-3709] Executors don't always report broadcast block removal properly back to the driver
* [SPARK-3734] DriverRunner should not read SPARK_HOME from submitter's ↵Josh Rosen2014-09-291-4/+1
| | | | | | | | | | | | | | | | | | | | | | environment When using spark-submit in `cluster` mode to submit a job to a Spark Standalone cluster, if the JAVA_HOME environment variable was set on the submitting machine then DriverRunner would attempt to use the submitter's JAVA_HOME to launch the driver process (instead of the worker's JAVA_HOME), causing the driver to fail unless the submitter and worker had the same Java location. This commit fixes this by reading JAVA_HOME from sys.env instead of command.environment. Author: Josh Rosen <joshrosen@apache.org> Closes #2586 from JoshRosen/SPARK-3734 and squashes the following commits: e9513d9 [Josh Rosen] [SPARK-3734] DriverRunner should not read SPARK_HOME from submitter's environment. (cherry picked from commit b167a8c7e75d9e816784bd655bce1feb6c447210) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3032][Shuffle] Fix key comparison integer overflow introduced sorting ↵jerryshao2014-09-291-1/+1
| | | | | | | | | | | | | | | | | | exception Previous key comparison in `ExternalSorter` will get wrong sorting result or exception when key comparison overflows, details can be seen in [SPARK-3032](https://issues.apache.org/jira/browse/SPARK-3032). Here fix this and add a unit test to prove it. Author: jerryshao <saisai.shao@intel.com> Closes #2514 from jerryshao/SPARK-3032 and squashes the following commits: 6f3c302 [jerryshao] Improve the unit test according to comments 01911e6 [jerryshao] Change the test to show the contract violate exception 83acb38 [jerryshao] Minor changes according to comments fa2a08f [jerryshao] Fix key comparison integer overflow introduced sorting exception (cherry picked from commit dab1b0ae29a6d3017bdca23464f22a51d51eaae1) Signed-off-by: Matei Zaharia <matei@databricks.com>
* [CORE] Bugfix: LogErr format in DAGScheduler.scalaZhang, Liye2014-09-291-1/+1
| | | | | | | | | | | Author: Zhang, Liye <liye.zhang@intel.com> Closes #2572 from liyezhang556520/DAGLogErr and squashes the following commits: 5be2491 [Zhang, Liye] Bugfix: LogErr format in DAGScheduler.scala (cherry picked from commit 657bdff41a27568a981b3e342ad380fe92aa08a0) Signed-off-by: Reynold Xin <rxin@apache.org>
* [SPARK-1853] Show Streaming application code context (file, line number) in ↵Mubarak Seyed2014-09-233-20/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Spark Stages UI This is a refactored version of the original PR https://github.com/apache/spark/pull/1723 my mubarak Please take a look andrewor14, mubarak Author: Mubarak Seyed <mubarak.seyed@gmail.com> Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #2464 from tdas/streaming-callsite and squashes the following commits: dc54c71 [Tathagata Das] Made changes based on PR comments. 390b45d [Tathagata Das] Fixed minor bugs. 904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite 7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site. b9ed945 [Mubarak Seyed] Adding streaming utils c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master' ceb43da [Mubarak Seyed] Changing default regex function name 8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master' 196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master' 491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream 33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master' f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream 5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None) a207eb7 [Mubarak Seyed] Fixing code review comments ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream 2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853 1d90cc3 [Mubarak Seyed] Changes for SPARK-1853 5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master' 70f494f [Mubarak Seyed] Changes for SPARK-1853 1500deb [Mubarak Seyed] Changes in Spark Streaming UI 9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI d466d75 [Mubarak Seyed] Changes for spark streaming UI (cherry picked from commit 729952a5efce755387c76cdf29280ee6f49fdb72) Signed-off-by: Andrew Or <andrewor14@gmail.com>
* [SPARK-3653] Respect SPARK_*_MEMORY for cluster modeAndrew Or2014-09-231-0/+4
| | | | | | | | | | | | | | | `SPARK_DRIVER_MEMORY` was only used to start the `SparkSubmit` JVM, which becomes the driver only in client mode but not cluster mode. In cluster mode, this property is simply not propagated to the worker nodes. `SPARK_EXECUTOR_MEMORY` is picked up from `SparkContext`, but in cluster mode the driver runs on one of the worker machines, where this environment variable may not be set. Author: Andrew Or <andrewor14@gmail.com> Closes #2500 from andrewor14/memory-env-vars and squashes the following commits: 6217b38 [Andrew Or] Respect SPARK_*_MEMORY for cluster mode Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
* SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach ...Sandy Ryza2014-09-231-5/+11
| | | | | | | | | | | | | ...the driver Author: Sandy Ryza <sandy@cloudera.com> Closes #2487 from sryza/sandy-spark-3612 and squashes the following commits: 2b7353d [Sandy Ryza] SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach the driver (cherry picked from commit d79238d03a2ffe0cf5fc6166543d67768693ddbe) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* Revert "[SPARK-3595] Respect configured OutputCommitters when calling ↵Patrick Wendell2014-09-212-7/+2
| | | | | | | | saveAsHadoopFile" This reverts commit 7a766577a466377bf504fa2d8c3ca454844a6ea6. [NOTE: After some thought I decided not to merge this into 1.1 quite yet]
* [SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFileIan Hummel2014-09-212-2/+7
| | | | | | | | | | | | | Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding the OutputCommitter. This is not ideal when running Spark jobs that write to S3, especially when running them from an EMR cluster where the default OutputCommitter is a DirectOutputCommitter. Author: Ian Hummel <ian@themodernlife.net> Closes #2450 from themodernlife/spark-3595 and squashes the following commits: f37a0e5 [Ian Hummel] Update based on comments from pwendell a11d9f3 [Ian Hummel] Fix formatting 4359664 [Ian Hummel] Add an example showing usage 8b6be94 [Ian Hummel] Add ability to specify OutputCommitter, espcially useful when writing to an S3 bucket from an EMR cluster
* [Minor Hot Fix] Move a line in SparkSubmit to the right placeAndrew Or2014-09-181-1/+1
| | | | | | | | | | | | | This was introduced in #2449 Author: Andrew Or <andrewor14@gmail.com> Closes #2452 from andrewor14/standalone-hot-fix and squashes the following commits: d5190ca [Andrew Or] Put that line in the right place (cherry picked from commit 9306297d1d888d0430f79b2133ee7377871a3a18) Signed-off-by: Andrew Or <andrewor14@gmail.com>