spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #74 from rxin/kill	Matei Zaharia	2013-10-18	4	-7/+75
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Job cancellation via job group id. This PR adds a simple API to group together a set of jobs belonging to a thread and threads spawned from it. It also allows the cancellation of all jobs in this group. An example: sc.setJobDescription("this_is_the_group_id", "some job description") sc.parallelize(1 to 10000, 2).map { i => Thread.sleep(10); i }.count() In a separate thread: sc.cancelJobGroup("this_is_the_group_id")
\| *	Job cancellation via job group id.	Reynold Xin	2013-10-18	4	-7/+75
\| \|
* \|	Merge pull request #68 from mosharaf/master	Matei Zaharia	2013-10-18	5	-9/+308
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Faster and stable/reliable broadcast HttpBroadcast is noticeably slow, but the alternatives (TreeBroadcast or BitTorrentBroadcast) are notoriously unreliable. The main problem with them is they try to manage the memory for the pieces of a broadcast themselves. Right now, the BroadcastManager does not know which machines the tasks reading from a broadcast variable is running and when they have finished. Consequently, we try to guess and often guess wrong, which blows up the memory usage and kills/hangs jobs. This very simple implementation solves the problem by not trying to manage the intermediate pieces; instead, it offloads that duty to the BlockManager which is quite good at juggling blocks. Otherwise, it is very similar to the BitTorrentBroadcast implementation (without fancy optimizations). And it runs much faster than HttpBroadcast we have right now. I've been using this for another project for last couple of weeks, and just today did some benchmarking against the Http one. The following shows the improvements for increasing broadcast size for cold runs. Each line represent the number of receivers. ![fix-bc-first](https://f.cloud.github.com/assets/232966/1349342/ffa149e4-36e7-11e3-9fa6-c74555829356.png) After the first broadcast is over, i.e., after JVM is wormed up and for HttpBroadcast the server is already running (I think), the following are the improvements for warm runs. ![fix-bc-succ](https://f.cloud.github.com/assets/232966/1349352/5a948bae-36e8-11e3-98ce-34f19ebd33e0.jpg) The curves are not as nice as the cold runs, but the improvements are obvious, specially for larger broadcasts and more receivers. Depending on how it goes, we should deprecate and/or remove old TreeBroadcast and BitTorrentBroadcast implementations, and hopefully, SPARK-889 will not be necessary any more.
\| * \|	Should compile now.	Mosharaf Chowdhury	2013-10-17	1	-1/+2
\| \| \|
\| * \|	Added an after block to reset spark.broadcast.factory	Mosharaf Chowdhury	2013-10-17	1	-0/+4
\| \| \|
\| * \|	BroadcastSuite updated to test both HttpBroadcast and TorrentBroadcast in ↵	Mosharaf Chowdhury	2013-10-17	1	-3/+44
\| \| \| \| \| \| \| \| \| \| \| \|	local, local[N], local-cluster settings.
\| * \|	Merge remote-tracking branch 'upstream/master'	Mosharaf Chowdhury	2013-10-17	9	-105/+63
\| \|\\|
\| * \|	Code styling. Updated doc.	Mosharaf Chowdhury	2013-10-17	1	-4/+4
\| \| \|
\| * \|	Removed unused code.	Mosharaf Chowdhury	2013-10-17	2	-14/+11
\| \| \| \| \| \| \| \| \|	Changes to match Spark coding style.
\| * \|	Fixes for the new BlockId naming convention.	Mosharaf Chowdhury	2013-10-16	2	-7/+14
\| \| \|
\| * \|	Default blockSize is 4MB.	Mosharaf Chowdhury	2013-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	BroadcastTest2 example added for testing broadcasts.
\| * \|	Removed unnecessary code, and added comment of memory-latency tradeoff.	Mosharaf Chowdhury	2013-10-16	1	-4/+6
\| \| \|
\| * \|	Torrent-ish broadcast based on BlockManager.	Mosharaf Chowdhury	2013-10-16	3	-4/+251
\| \| \|
* \| \|	Spark shell exits if it cannot create SparkContext	Aaron Davidson	2013-10-17	1	-7/+6
\| \|/ \|/\| \| \| \| \| \| \| \| \|	Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail.
* \|	Fixed unit tests	Kay Ousterhout	2013-10-16	2	-25/+26
\| \|
* \|	Removed TaskSchedulerListener interface.	Kay Ousterhout	2013-10-16	7	-80/+37
\|/ \| \| \| \| \| \| \| \|	The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function).
*	Merge pull request #62 from harveyfeng/master	Matei Zaharia	2013-10-15	2	-2/+5
\|\ \| \| \| \| \| \|	Make TaskContext's stageId publicly accessible.
\| *	Proper formatting for SparkHadoopWriter class extensions.	Harvey Feng	2013-10-15	1	-1/+3
\| \|
\| *	Fix line length > 100 chars in SparkHadoopWriter	Harvey Feng	2013-10-15	1	-1/+2
\| \|
\| *	Make TaskContext's stageId publicly accessible.	Harvey Feng	2013-10-15	1	-1/+1
\| \|
* \|	Merge pull request #34 from kayousterhout/rename	Matei Zaharia	2013-10-15	6	-36/+42
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Renamed StandaloneX to CoarseGrainedX. (as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14) The previous names were confusing because the components weren't just used in Standalone mode. The scheduler used for Standalone mode is called SparkDeploySchedulerBackend, so referring to the base class as StandaloneSchedulerBackend was misleading.
\| * \|	Fixed build error after merging in master	Kay Ousterhout	2013-10-15	1	-1/+1
\| \| \|
\| * \|	Merge remote branch 'upstream/master' into rename	Kay Ousterhout	2013-10-15	132	-1205/+4531
\| \|\\|
\| * \|	Added back fully qualified class name	Kay Ousterhout	2013-10-06	1	-1/+1
\| \| \|
\| * \|	Renamed StandaloneX to CoarseGrainedX.	Kay Ousterhout	2013-10-04	6	-35/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous names were confusing because the components weren't just used in Standalone mode -- in fact, the scheduler used for Standalone mode is called SparkDeploySchedulerBackend. So, the previous names were misleading.
* \| \|	Unified daemon thread pools	Kay Ousterhout	2013-10-15	7	-38/+29
\| \|/ \|/\|
* \|	Bump up logging level to warning for failed tasks.	Reynold Xin	2013-10-14	1	-5/+5
\| \|
* \|	Merge branch 'master' of github.com:apache/incubator-spark into kill	Reynold Xin	2013-10-14	48	-439/+635
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
\| * \	Merge pull request #57 from aarondav/bid	Reynold Xin	2013-10-14	40	-369/+527
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor BlockId into an actual type Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.
\| \| * \|	Address Matei's comments	Aaron Davidson	2013-10-14	8	-34/+28
\| \| \| \|
\| \| * \|	Change BlockId filename to name + rest of Patrick's comments	Aaron Davidson	2013-10-13	11	-36/+39
\| \| \| \|
\| \| * \|	Add unit test and address rest of Reynold's comments	Aaron Davidson	2013-10-12	10	-20/+144
\| \| \| \|
\| \| * \|	Refactor BlockId into an actual type	Aaron Davidson	2013-10-12	39	-369/+406
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an unfortunately invasive change which converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Since this touches a lot of files, it'd be best to either get this patch in quickly or throw it on the ground to avoid too many secondary merge conflicts.
\| * \| \|	Merge pull request #52 from harveyfeng/hadoop-closure	Reynold Xin	2013-10-12	2	-55/+26
\| \|\ \ \ \| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an optional closure parameter to HadoopRDD instantiation to use when creating local JobConfs. Having HadoopRDD accept this optional closure eliminates the need for the HadoopFileRDD added earlier. It makes the HadoopRDD more general, in that the caller can specify any JobConf initialization flow.
\| \| * \|	Remove the new HadoopRDD constructor from SparkContext API, plus some minor ↵	Harvey Feng	2013-10-12	2	-27/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	style changes.
\| \| * \|	Add an optional closure parameter to HadoopRDD instantiation to used when ↵	Harvey Feng	2013-10-10	2	-53/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	creating any local JobConfs.
\| * \| \|	Merge pull request #53 from witgo/master	Matei Zaharia	2013-10-11	1	-0/+4
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a zookeeper compile dependency to fix build in maven Add a zookeeper compile dependency to fix build in maven
\| \| * \| \|	Add a zookeeper compile dependency to fix build in maven	LiGuoqiang	2013-10-11	1	-0/+4
\| \| \| \| \|
\| * \| \| \|	Merge pull request #32 from mridulm/master	Matei Zaharia	2013-10-11	12	-29/+93
\| \|\ \ \ \ \| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Address review comments, move to incubator spark Also includes a small fix to speculative execution. <edit> Continued from https://github.com/mesos/spark/pull/914 </edit>
\| \| * \| \|	- Allow for finer control of cleaner	Mridul Muralidharan	2013-10-06	12	-29/+93
\| \| \| \|/ \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \|	- Address review comments, move to incubator spark - Also includes a change to speculation - including preventing exceptions in rare cases.
* \| \| \|	Fixed PairRDDFunctionsSuite after removing InterruptibleRDD.	Reynold Xin	2013-10-12	1	-1/+1
\| \| \| \|
* \| \| \|	Job cancellation: address Matei's code review feedback.	Reynold Xin	2013-10-12	17	-216/+248
\| \| \| \|
* \| \| \|	Job cancellation: addressed code review feedback round 2 from Kay.	Reynold Xin	2013-10-11	3	-44/+47
\| \| \| \|
* \| \| \|	Fixed dagscheduler suite because of a logging message change.	Reynold Xin	2013-10-11	1	-1/+1
\| \| \| \|
* \| \| \|	Job cancellation: addressed code review feedback from Kay.	Reynold Xin	2013-10-11	12	-80/+86
\| \| \| \|
* \| \| \|	Making takeAsync and collectAsync deterministic.	Reynold Xin	2013-10-11	3	-19/+15
\| \| \| \|
* \| \| \|	Properly handle interrupted exception in FutureAction.	Reynold Xin	2013-10-11	1	-7/+5
\| \| \| \|
* \| \| \|	Merge branch 'master' of github.com:apache/incubator-spark into kill	Reynold Xin	2013-10-10	35	-212/+2106
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
\| * \| \|	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache'	Matei Zaharia	2013-10-10	1	-3/+14
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
\| \| * \| \|	Adding in the --addJars option to make SparkContext.addJar work on yarn and ↵	tgravescs	2013-10-03	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup the classpaths