spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[maven-release-plugin] prepare release v1.1.1-rc2v1.1.1	Andrew Or	2014-11-19	1	-1/+1
\|
*	[SPARK-4480] Avoid many small spills in external data structures (1.1)	Andrew Or	2014-11-19	4	-12/+37
\| \| \| \| \| \| \| \| \| \| \|	This is the branch-1.1 version of #3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1. Author: Andrew Or <andrew@databricks.com> Closes #3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits: f2e552c [Andrew Or] Fix tests 7012595 [Andrew Or] Avoid many small spills
*	[SPARK-4380] Log more precise number of bytes spilled (1.1)	Andrew Or	2014-11-18	2	-4/+6
\| \| \| \| \| \| \| \| \| \|	This is the branch-1.1 version of #3243. Author: Andrew Or <andrew@databricks.com> Closes #3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits: 36ec152 [Andrew Or] Log more precise representation of bytes in spilling code
*	[SPARK-4433] fix a racing condition in zipWithIndex	Xiangrui Meng	2014-11-18	2	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex (cherry picked from commit bb46046154a438df4db30a0e1fd557bd3399ee7b) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; ↵	Kousuke Saruta	2014-11-18	1	-13/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	use HashedWheelTimer (For branch-1.1) This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits: 786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager
*	[SPARK-4467] Partial fix for fetch failure in sort-based shuffle (1.1)	Andrew Or	2014-11-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This is the 1.1 version of #3302. There has been some refactoring in master so we can't cherry-pick that PR. Author: Andrew Or <andrew@databricks.com> Closes #3330 from andrewor14/sort-fetch-fail and squashes the following commits: 486fc49 [Andrew Or] Reset `elementsRead`
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-17	1	-1/+1
\| \| \| \|	This reverts commit 72a4fdbe82203b962fe776d0edaed7f56898cb02.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-17	1	-1/+1
\| \| \| \|	This reverts commit 685bdd2b7e584c84e7d39e40de2d5f30c5388cb5.
*	Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file"	Andrew Or	2014-11-17	2	-16/+1
\| \| \| \|	This reverts commit 098f83c7ccd7dad9f9228596da69fe5f55711a52.
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-13	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-13	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \|	This reverts commit 3f9e073ff0bb18b6079fda419d4e9dbf594545b0.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \|	This reverts commit 6de888129fcfe6e592458a4217fc66140747b54f.
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-12	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-12	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \|	This reverts commit 7029301778895427216f2e0710c6e72a523c0897.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \|	This reverts commit db22a9e2cb51eae2f8a79648ce3c6bf4fecdd641.
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-12	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-12	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \|	This reverts commit 837deabebf0714e3f3aca135d77169cc825824f3.
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-12	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.1.1-rc1"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \| \| \| \|	This reverts commit f3e62ffa4ccea62911207b918ef1c23c1f50467f. Conflicts: pom.xml
*	Revert "[maven-release-plugin] prepare for next development iteration"	Andrew Or	2014-11-12	1	-1/+1
\| \| \| \|	This reverts commit 5c0032a471d858fb010b1737ea14375f1af3ed88.
*	[maven-release-plugin] prepare for next development iteration	Andrew Or	2014-11-11	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.1.1-rc1	Andrew Or	2014-11-11	1	-1/+1
\|
*	Update versions for 1.1.1 release	Andrew Or	2014-11-10	1	-1/+1
\|
*	[SPARK-3495][SPARK-3496] Backporting block replication fixes made in master ↵	Tathagata Das	2014-11-10	8	-44/+535
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to branch 1.1 The original PR was #2366 This backport was non-trivial because Spark 1.1 uses ConnectionManager instead of NioBlockTransferService, which required slight modification to unit tests. Other than that the code is exactly same as in the original PR. Please refer to discussion in the original PR if you have any thoughts. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3191 from tdas/replication-fix-branch-1.1-backport and squashes the following commits: 593214a [Tathagata Das] Merge remote-tracking branch 'apache-github/branch-1.1' into branch-1.1 2ed927f [Tathagata Das] Fixed error in unit test. de4ff73 [Tathagata Das] [SPARK-3495] Block replication fails continuously when the replication target node is dead AND [SPARK-3496] Block replication by mistake chooses driver as target
*	[SPARK-4169] [Core] Accommodate non-English Locales in unit tests	Niklas Wilcke	2014-11-10	2	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For me the core tests failed because there are two locale dependent parts in the code. Look at the Jira ticket for details. Why is it necessary to check the exception message in isBindCollision in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686 ? Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de> Closes #3036 from numbnut/core-test-fix and squashes the following commits: 1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests (cherry picked from commit ed8bf1eac548577c4bbad7ce3f7f301a2f52ef17) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-4158] Fix for missing resources.	Brenden Matthews	2014-11-05	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mesos offers may not contain all resources, and Spark needs to check to ensure they are present and sufficient. Spark may throw an erroneous exception when resources aren't present. Author: Brenden Matthews <brenden@diddyinc.com> Closes #3024 from brndnmtthws/fix-mesos-resource-misuse and squashes the following commits: e5f9580 [Brenden Matthews] [SPARK-4158] Fix for missing resources. (cherry picked from commit cb0eae3b78d7f6f56c0b9521ee48564a4967d3de) Signed-off-by: Andrew Or <andrew@databricks.com>
*	SPARK-3223 runAsSparkUser cannot change HDFS write permission properly i...	Jongyoul Lee	2014-11-05	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...n mesos cluster mode - change master newer Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3034 from jongyoul/SPARK-3223 and squashes the following commits: 42b2ed3 [Jongyoul Lee] SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode - change master newer (cherry picked from commit f7ac8c2b1de96151231617846b7468d23379c74a) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-4097] Fix the race condition of 'thread'	zsxwing	2014-10-29	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a chance that `thread` is null when calling `thread.interrupt()`. ```Scala override def cancel(): Unit = this.synchronized { _cancelled = true if (thread != null) { thread.interrupt() } } ``` Should put `thread = null` into a `synchronized` block to fix the race condition. Author: zsxwing <zsxwing@gmail.com> Closes #2957 from zsxwing/SPARK-4097 and squashes the following commits: edf0aee [zsxwing] Add comments to explain the lock c5cfeca [zsxwing] Fix the race condition of 'thread' (cherry picked from commit e7fd80413d531e23b6c4def0ee32e52a39da36fa) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-4107] Fix incorrect handling of read() and skip() return values ↵	Josh Rosen	2014-10-28	5	-32/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(branch-1.1 backport) `read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. `skip()` faces similar issues, too. This patch fixes several cases where we mis-handle these methods' return values. This is a backport of #2969 to `branch-1.1`. Author: Josh Rosen <joshrosen@databricks.com> Closes #2974 from JoshRosen/spark-4107-branch-1.1-backport and squashes the following commits: d82c05b [Josh Rosen] [SPARK-4107] Fix incorrect handling of read() and skip() return values
*	[SPARK-4080] Only throw IOException from [write\|read][Object\|External]	Josh Rosen	2014-10-24	23	-34/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If classes implementing Serializable or Externalizable interfaces throw exceptions other than IOException or ClassNotFoundException from their (de)serialization methods, then this results in an unhelpful "IOException: unexpected exception type" rather than the actual exception that produced the (de)serialization error. This patch fixes this by adding a utility method that re-wraps any uncaught exceptions in IOException (unless they are already instances of IOException). Author: Josh Rosen <joshrosen@databricks.com> Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits: cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write\|read][Object\|External]. (cherry picked from commit 6c98c29ae0033556fd4424f41d1de005c509e511) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala
*	[SPARK-4006] In long running contexts, we encountered the situation of d...	Tal Sliwowicz	2014-10-24	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...ouble registe... ...r without a remove in between. The cause for that is unknown, and assumed a temp network issue. However, since the second register is with a BlockManagerId on a different port, blockManagerInfo.contains() returns false, while blockManagerIdByExecutor returns Some. This inconsistency is caught in a conditional statement that does System.exit(1), which is a huge robustness issue for us. The fix - simply remove the old id from both maps during register when this happens. We are mimicking the behavior of expireDeadHosts(), by doing local cleanup of the maps before trying to add new ones. Also - added some logging for register and unregister. This is just like https://github.com/apache/spark/pull/2886 except it's on branch-1.1 Author: Tal Sliwowicz <tal.s@taboola.com> Closes #2915 from tsliwowicz/branch-1.1-block-mgr-removal and squashes the following commits: d122236 [Tal Sliwowicz] [SPARK-4006] In long running contexts, we encountered the situation of double registe...
*	[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file	Kousuke Saruta	2014-10-24	2	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In deploy.ClientArguments.isValidJarUrl, the url is checked as follows. def isValidJarUrl(s: String): Boolean = s.matches("(.+):(.+)jar") So, it allows like 'hdfs:file.jar' (no authority). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2925 from sarutak/uri-syntax-check-improvement and squashes the following commits: cf06173 [Kousuke Saruta] Improved URI syntax checking (cherry picked from commit 098f83c7ccd7dad9f9228596da69fe5f55711a52) Signed-off-by: Andrew Or <andrew@databricks.com>
*	[SPARK-3426] Fix sort-based shuffle error when spark.shuffle.compress and ↵	Josh Rosen	2014-10-22	6	-11/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark.shuffle.spill.compress settings are different This PR fixes SPARK-3426, an issue where sort-based shuffle crashes if the `spark.shuffle.spill.compress` and `spark.shuffle.compress` settings have different values. The problem is that sort-based shuffle's read and write paths use different settings for determining whether to apply compression. ExternalSorter writes runs to files using `TempBlockId` ids, which causes `spark.shuffle.spill.compress` to be used for enabling compression, but these spilled files end up being shuffled over the network and read as shuffle files using `ShuffleBlockId` by BlockStoreShuffleFetcher, which causes `spark.shuffle.compress` to be used for enabling decompression. As a result, this leads to errors when these settings disagree. Based on the discussions in #2247 and #2178, it sounds like we don't want to remove the `spark.shuffle.spill.compress` setting. Therefore, I've tried to come up with a fix where `spark.shuffle.spill.compress` is used to compress data that's read and written locally and `spark.shuffle.compress` is used to compress any data that will be fetched / read as shuffle blocks. To do this, I split `TempBlockId` into two new id types, `TempLocalBlockId` and `TempShuffleBlockId`, which map to `spark.shuffle.spill.compress` and `spark.shuffle.compress`, respectively. ExternalAppendOnlyMap also used temp blocks for spilling data. It looks like ExternalSorter was designed to be a generic sorter but its configuration already happens to be tied to sort-based shuffle, so I think it's fine if we use `spark.shuffle.compress` to compress its spills; we can move the compression configuration to the constructor in a later commit if we find that ExternalSorter is being used in other contexts where we want different configuration options to control compression. To summarize: Before: \| \| ExternalAppendOnlyMap \| ExternalSorter \| \|-------\|------------------------------\|------------------------------\| \| Read \| spark.shuffle.spill.compress \| spark.shuffle.compress \| \| Write \| spark.shuffle.spill.compress \| spark.shuffle.spill.compress \| After: \| \| ExternalAppendOnlyMap \| ExternalSorter \| \|-------\|------------------------------\|------------------------\| \| Read \| spark.shuffle.spill.compress \| spark.shuffle.compress \| \| Write \| spark.shuffle.spill.compress \| spark.shuffle.compress \| Thanks to andrewor14 for debugging this with me! Author: Josh Rosen <joshrosen@databricks.com> Closes #2890 from JoshRosen/SPARK-3426 and squashes the following commits: 1921cf6 [Josh Rosen] Minor edit for clarity. c8dd8f2 [Josh Rosen] Add comment explaining use of createTempShuffleBlock(). 2c687b9 [Josh Rosen] Fix SPARK-3426. 91e7e40 [Josh Rosen] Combine tests into single test of all combinations 76ca65e [Josh Rosen] Add regression test for SPARK-3426. Conflicts: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
*	[SPARK-4010][Web UI]Spark UI returns 500 in yarn-client mode	GuoQiang Li	2014-10-20	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem caused by #1966 CC YanTangZhai andrewor14 Author: GuoQiang Li <witgo@qq.com> Closes #2858 from witgo/SPARK-4010 and squashes the following commits: 9866fbf [GuoQiang Li] Spark UI returns 500 in yarn-client mode (cherry picked from commit 51afde9d8b8a67958c4632a13af143d7c7fd1f04) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3948][Shuffle]Fix stream corruption bug in sort-based shuffle	jerryshao	2014-10-20	2	-5/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kernel 2.6.32 bug will lead to unexpected behavior of transferTo in copyStream, and this will corrupt the shuffle output file in sort-based shuffle, which will somehow introduce PARSING_ERROR(2), deserialization error or offset out of range. Here fix this by adding append flag, also add some position checking code. Details can be seen in [SPARK-3948](https://issues.apache.org/jira/browse/SPARK-3948). Author: jerryshao <saisai.shao@intel.com> Closes #2824 from jerryshao/SPARK-3948 and squashes the following commits: be0533a [jerryshao] Address the comments a82b184 [jerryshao] add configuration to control the NIO way of copying stream e17ada2 [jerryshao] Fix kernel 2.6.32 bug led unexpected behavior of transferTo (cherry picked from commit c7aeecd08fd329085760fa89025ec0d9c04f5e3f) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/util/Utils.scala
*	[SPARK-2546] Clone JobConf for each task (branch-1.0 / 1.1 backport)	Josh Rosen	2014-10-19	1	-15/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch attempts to fix SPARK-2546 in `branch-1.0` and `branch-1.1`. The underlying problem is that thread-safety issues in Hadoop Configuration objects may cause Spark tasks to get stuck in infinite loops. The approach taken here is to clone a new copy of the JobConf for each task rather than sharing a single copy between tasks. Note that there are still Configuration thread-safety issues that may affect the driver, but these seem much less likely to occur in practice and will be more complex to fix (see discussion on the SPARK-2546 ticket). This cloning is guarded by a new configuration option (`spark.hadoop.cloneConf`) and is disabled by default in order to avoid unexpected performance regressions for workloads that are unaffected by the Configuration thread-safety issues. Author: Josh Rosen <joshrosen@apache.org> Closes #2684 from JoshRosen/jobconf-fix-backport and squashes the following commits: f14f259 [Josh Rosen] Add configuration option to control cloning of Hadoop JobConf. b562451 [Josh Rosen] Remove unused jobConfCacheKey field. dd25697 [Josh Rosen] [SPARK-2546] [1.0 / 1.1 backport] Clone JobConf for each task.
*	SPARK-3926 [CORE] Result of JavaRDD.collectAsMap() is not Serializable	Sean Owen	2014-10-18	3	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are Author: Sean Owen <sowen@cloudera.com> Closes #2805 from srowen/SPARK-3926 and squashes the following commits: ecb78ee [Sean Owen] Fix conflict between java.io.Serializable and use of Scala's Serializable f4717f9 [Sean Owen] Oops, fix compile problem ae1b36f [Sean Owen] Expand to cover Maps returned from other Java API methods as well 51c26c2 [Sean Owen] Make JavaPairRDD.collectAsMap result Serializable since Java Maps generally are
*	[SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 vers...	Marcelo Vanzin	2014-10-17	3	-12/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...ion). This is a backport of SPARK-3606 to branch-1.1. Some of the code had to be duplicated since branch-1.1 doesn't have the cleanup work that was done to the Yarn codebase. I don't know whether the version issue in yarn/alpha/pom.xml was intentional, but I couldn't compile the code without fixing it. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #2497 from vanzin/SPARK-3606-1.1 and squashes the following commits: 4fd3c27 [Marcelo Vanzin] Remove unused imports. 75cde8c [Marcelo Vanzin] Scala is weird. b27ebda [Marcelo Vanzin] Review feedback. 72ceafb [Marcelo Vanzin] Undelete needed import. 61162a6 [Marcelo Vanzin] Use separate config for each param instead of json. 3b7205f [Marcelo Vanzin] Review feedback. b3b3e50 [Marcelo Vanzin] [SPARK-3606] [yarn] Correctly configure AmIpFilter for Yarn HA (1.1 version).
*	[SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section ↵	yantangzhai	2014-10-16	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sometimes JobProgressPage could not show Fair Scheduler Pools section sometimes. SparkContext starts webui and then postEnvironmentUpdate. Sometimes JobProgressPage is accessed between webui starting and postEnvironmentUpdate, then the lazy val isFairScheduler will be false. The Fair Scheduler Pools section will not display any more. Author: yantangzhai <tyz0303@163.com> Author: YanTangZhai <hakeemzhai@tencent.com> Closes #1966 from YanTangZhai/SPARK-3067 and squashes the following commits: d4323f8 [yantangzhai] update [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes 8a00106 [YanTangZhai] Merge pull request #6 from apache/master b6391cc [yantangzhai] revert [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes d2226cd [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes cbcba66 [YanTangZhai] Merge pull request #3 from apache/master aac7f7b [yantangzhai] [SPARK-3067] JobProgressPage could not show Fair Scheduler Pools section sometimes cdef539 [YanTangZhai] Merge pull request #1 from apache/master (cherry picked from commit dedace83f35cba0f833d962acbd75572318948c4) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3905][Web UI]The keys for sorting the columns of Executor page ,Stage ↵	GuoQiang Li	2014-10-12	3	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	page Storage page are incorrect Author: GuoQiang Li <witgo@qq.com> Closes #2763 from witgo/SPARK-3905 and squashes the following commits: 17d7990 [GuoQiang Li] The keys for sorting the columns of Executor page ,Stage page Storage page are incorrect (cherry picked from commit b4a7fa7a663c462bf537ca9d63af0dba6b4a8033) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[SPARK-3121] Wrong implementation of implicit bytesWritableConverter	Jakub Dubovský	2014-10-12	2	-1/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	val path = ... //path to seq file with BytesWritable as type of both key and value val file = sc.sequenceFile[Array[Byte],Array[Byte]](path) file.take(1)(0)._1 This prints incorrect content of byte array. Actual content starts with correct one and some "random" bytes and zeros are appended. BytesWritable has two methods: getBytes() - return content of all internal array which is often longer then actual value stored. It usually contains the rest of previous longer values copyBytes() - return just begining of internal array determined by internal length property It looks like in implicit conversion between BytesWritable and Array[byte] getBytes is used instead of correct copyBytes. dbtsai Author: Jakub Dubovský <james64@inMail.sk> Author: Dubovsky Jakub <dubovsky@avast.com> Closes #2712 from james64/3121-bugfix and squashes the following commits: f85d24c [Jakub Dubovský] Test name changed, comments added 1b20d51 [Jakub Dubovský] Import placed correctly 406e26c [Jakub Dubovský] Scala style fixed f92ffa6 [Dubovsky Jakub] performance tuning 480f9cd [Dubovsky Jakub] Bug 3121 fixed (cherry picked from commit fc616d51a510f82627b5be949a5941419834cf70) Signed-off-by: Josh Rosen <joshrosen@apache.org>
*	[SPARK-3844][UI] Truncate appName in WebUI if it is too long	Xiangrui Meng	2014-10-09	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Truncate appName in WebUI if it is too long. Author: Xiangrui Meng <meng@databricks.com> Closes #2707 from mengxr/truncate-app-name and squashes the following commits: 87834ce [Xiangrui Meng] move scala import below java c7111dc [Xiangrui Meng] truncate appName in WebUI if it is too long (cherry picked from commit 86b392942daf61fed2ff7490178b128107a0e856) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3829] Make Spark logo image on the header of HistoryPage as a link to ↵	Kousuke Saruta	2014-10-07	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HistoryPage's page #1 There is a Spark logo on the header of HistoryPage. We can have too many HistoryPages if we run 20+ applications. So I think, it's useful if the logo is as a link to the HistoryPage's page number 1. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2690 from sarutak/SPARK-3829 and squashes the following commits: 908c109 [Kousuke Saruta] Removed extra space. 00bfbd7 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-3829 dd87480 [Kousuke Saruta] Made header Spark log image as a link to History Server's top page. (cherry picked from commit b69c9fb6fb048509bbd8430fb697dc3a5ca4fe59) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3777] Display "Executor ID" for Tasks in Stage page	zsxwing	2014-10-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now the Stage page only displays "Executor"(host) for tasks. However, there may be more than one Executors running in the same host. Currently, when some task is hung, I only know the host of the faulty executor. Therefore I have to check all executors in the host. Adding "Executor ID" in the Tasks table. would be helpful to locate the faulty executor. Here is the new page: ![add_executor_id_for_tasks](https://cloud.githubusercontent.com/assets/1000778/4505774/acb9648c-4afa-11e4-8826-8768a0a60cc9.png) Author: zsxwing <zsxwing@gmail.com> Closes #2642 from zsxwing/SPARK-3777 and squashes the following commits: 37945af [zsxwing] Put Executor ID and Host into one cell 4bbe2c7 [zsxwing] [SPARK-3777] Display "Executor ID" for Tasks in Stage page (cherry picked from commit 446063eca98ae56d1ac61415f4c6e89699b8db02) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3731] [PySpark] fix memory leak in PythonRDD	Davies Liu	2014-10-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The parent.getOrCompute() of PythonRDD is executed in a separated thread, it should release the memory reserved for shuffle and unrolling finally. Author: Davies Liu <davies.liu@gmail.com> Closes #2668 from davies/leak and squashes the following commits: ae98be2 [Davies Liu] fix memory leak in PythonRDD (cherry picked from commit bc87cc410fae59660c13b6ae1c14204df77237b8) Signed-off-by: Josh Rosen <joshrosen@apache.org> Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
*	[SPARK-3825] Log more detail when unrolling a block fails	Andrew Or	2014-10-07	2	-8/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before: ``` 14/10/06 16:45:42 WARN CacheManager: Not enough space to cache partition rdd_0_2 in memory! Free memory is 481861527 bytes. ``` After: ``` 14/10/07 11:08:24 WARN MemoryStore: Not enough space to cache rdd_2_0 in memory! (computed 68.8 MB so far) 14/10/07 11:08:24 INFO MemoryStore: Memory use = 1088.0 B (blocks) + 445.1 MB (scratch space shared across 8 thread(s)) = 445.1 MB. Storage limit = 459.5 MB. ``` Author: Andrew Or <andrewor14@gmail.com> Closes #2688 from andrewor14/cache-log-message and squashes the following commits: 28e33d6 [Andrew Or] Shy away from "unrolling" 5638c49 [Andrew Or] Grammar 39a0c28 [Andrew Or] Log more detail when unrolling a block fails (cherry picked from commit 553737c6e6d5ffa3b52a9888444f4beece5c5b1a) Signed-off-by: Andrew Or <andrewor14@gmail.com>
*	[SPARK-3827] Very long RDD names are not rendered properly in web UI	Hossein	2014-10-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Spark SQL we generate very long RDD names. These names are not properly rendered in the web UI. This PR fixes the rendering issue. [SPARK-3827] #comment Linking PR with JIRA Author: Hossein <hossein@databricks.com> Closes #2687 from falaki/sparkTableUI and squashes the following commits: fd06409 [Hossein] Limit width of cell when RDD name is too long (cherry picked from commit d65fd554b4de1dbd8db3090b0e50994010d30e78) Signed-off-by: Josh Rosen <joshrosen@apache.org>