spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	De-duplication in getRemote() and getRemoteBytes().	Josh Rosen	2013-10-19	1	-32/+18
\|
*	De-duplication in getLocal() and getLocalBytes().	Josh Rosen	2013-10-19	1	-100/+59
\|
*	Merge pull request #78 from mosharaf/master	Reynold Xin	2013-10-19	4	-2125/+0
\|\ \| \| \| \| \| \| \| \| \| \|	Removed BitTorrentBroadcast and TreeBroadcast. TorrentBroadcast replaces both.
\| *	Removed BitTorrentBroadcast and TreeBroadcast. TorrentBroadcast is replacing ↵	Mosharaf Chowdhury	2013-10-18	4	-2125/+0
\| \| \| \| \| \| \| \|	both.
* \|	Merge pull request #76 from pwendell/master	Reynold Xin	2013-10-18	1	-1/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.
\| * \|	Clarify compression property.	Patrick Wendell	2013-10-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clarifies that this governs compression of internal data, not input data or output data.
* \| \|	Merge pull request #74 from rxin/kill	Matei Zaharia	2013-10-18	4	-7/+75
\|\ \ \ \| \|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Job cancellation via job group id. This PR adds a simple API to group together a set of jobs belonging to a thread and threads spawned from it. It also allows the cancellation of all jobs in this group. An example: sc.setJobDescription("this_is_the_group_id", "some job description") sc.parallelize(1 to 10000, 2).map { i => Thread.sleep(10); i }.count() In a separate thread: sc.cancelJobGroup("this_is_the_group_id")
\| * \|	Job cancellation via job group id.	Reynold Xin	2013-10-18	4	-7/+75
\| \| \|
* \| \|	Merge pull request #66 from shivaram/sbt-assembly-deps	Matei Zaharia	2013-10-18	2	-7/+28
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SBT target to assemble dependencies This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies. So the work flow that should work now would be something like ``` ./sbt/sbt spark # Doing this once should suffice ## Make changes ./sbt/sbt compile ./sbt/sbt test or ./spark-shell ```
\| * \| \|	Rename SBT target to assemble-deps.	Shivaram Venkataraman	2013-10-16	1	-5/+5
\| \| \| \|
\| * \| \|	Exclude assembly jar from classpath if using deps	Shivaram Venkataraman	2013-10-16	1	-10/+18
\| \| \| \|
\| * \| \|	Merge branch 'master' of https://github.com/apache/incubator-spark into ↵	Shivaram Venkataraman	2013-10-15	248	-2259/+8454
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly-deps
\| * \| \| \|	Add a comment and exclude tools	Shivaram Venkataraman	2013-10-11	1	-1/+2
\| \| \| \| \|
\| * \| \| \|	Add new SBT target for dependency assembly	Shivaram Venkataraman	2013-10-09	2	-1/+13
\| \| \| \| \|
* \| \| \| \|	Merge pull request #68 from mosharaf/master	Matei Zaharia	2013-10-18	7	-12/+328
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Faster and stable/reliable broadcast HttpBroadcast is noticeably slow, but the alternatives (TreeBroadcast or BitTorrentBroadcast) are notoriously unreliable. The main problem with them is they try to manage the memory for the pieces of a broadcast themselves. Right now, the BroadcastManager does not know which machines the tasks reading from a broadcast variable is running and when they have finished. Consequently, we try to guess and often guess wrong, which blows up the memory usage and kills/hangs jobs. This very simple implementation solves the problem by not trying to manage the intermediate pieces; instead, it offloads that duty to the BlockManager which is quite good at juggling blocks. Otherwise, it is very similar to the BitTorrentBroadcast implementation (without fancy optimizations). And it runs much faster than HttpBroadcast we have right now. I've been using this for another project for last couple of weeks, and just today did some benchmarking against the Http one. The following shows the improvements for increasing broadcast size for cold runs. Each line represent the number of receivers. ![fix-bc-first](https://f.cloud.github.com/assets/232966/1349342/ffa149e4-36e7-11e3-9fa6-c74555829356.png) After the first broadcast is over, i.e., after JVM is wormed up and for HttpBroadcast the server is already running (I think), the following are the improvements for warm runs. ![fix-bc-succ](https://f.cloud.github.com/assets/232966/1349352/5a948bae-36e8-11e3-98ce-34f19ebd33e0.jpg) The curves are not as nice as the cold runs, but the improvements are obvious, specially for larger broadcasts and more receivers. Depending on how it goes, we should deprecate and/or remove old TreeBroadcast and BitTorrentBroadcast implementations, and hopefully, SPARK-889 will not be necessary any more.
\| * \| \| \| \|	Should compile now.	Mosharaf Chowdhury	2013-10-17	1	-1/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Added an after block to reset spark.broadcast.factory	Mosharaf Chowdhury	2013-10-17	1	-0/+4
\| \| \| \| \| \|
\| * \| \| \| \|	Merge remote-tracking branch 'upstream/master'	Mosharaf Chowdhury	2013-10-17	3	-3/+39
\| \|\ \ \ \ \ \| \| \| \|_\|/ / \| \| \|/\| \| \|
\| * \| \| \| \|	BroadcastSuite updated to test both HttpBroadcast and TorrentBroadcast in ↵	Mosharaf Chowdhury	2013-10-17	1	-3/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	local, local[N], local-cluster settings.
\| * \| \| \| \|	Merge remote-tracking branch 'upstream/master'	Mosharaf Chowdhury	2013-10-17	9	-105/+63
\| \|\ \ \ \ \
\| * \| \| \| \| \|	Code styling. Updated doc.	Mosharaf Chowdhury	2013-10-17	2	-4/+12
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Removed unused code.	Mosharaf Chowdhury	2013-10-17	2	-14/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changes to match Spark coding style.
\| * \| \| \| \| \|	BroadcastTest2 --> BroadcastTest	Mosharaf Chowdhury	2013-10-16	2	-62/+12
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fixes for the new BlockId naming convention.	Mosharaf Chowdhury	2013-10-16	2	-7/+14
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Default blockSize is 4MB.	Mosharaf Chowdhury	2013-10-16	2	-1/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BroadcastTest2 example added for testing broadcasts.
\| * \| \| \| \| \|	Removed unnecessary code, and added comment of memory-latency tradeoff.	Mosharaf Chowdhury	2013-10-16	1	-4/+6
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Torrent-ish broadcast based on BlockManager.	Mosharaf Chowdhury	2013-10-16	3	-4/+251
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #71 from aarondav/scdefaults	Matei Zaharia	2013-10-18	2	-8/+14
\|\ \ \ \ \ \ \ \| \|_\|_\|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark shell exits if it cannot create SparkContext Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail. Simply exiting seems clearer.
\| * \| \| \| \| \|	Spark shell exits if it cannot create SparkContext	Aaron Davidson	2013-10-17	2	-8/+14
\|/ / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail.
* \| \| \| \| \|	Merge pull request #69 from KarthikTunga/master	Matei Zaharia	2013-10-17	3	-3/+39
\|\ \ \ \ \ \ \| \|_\|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix for issue SPARK-627. Implementing --config argument in the scripts. This code fix is for issue SPARK-627. I added code to consider --config arguments in the scripts. In case the <conf-dir> is not a directory the scripts exit. I removed the --hosts argument. It can be achieved by giving a different config directory. Let me know if an explicit --hosts argument is required.
\| * \| \| \| \|	SPARK-627 , Implementing --config arguments in the scripts	KarthikTunga	2013-10-16	1	-1/+1
\| \| \| \| \| \|
\| * \| \| \| \|	SPARK-627 , Implementing --config arguments in the scripts	KarthikTunga	2013-10-16	2	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Implementing --config argument in the scripts	KarthikTunga	2013-10-16	2	-7/+10
\| \| \| \| \| \|
\| * \| \| \| \|	Merge branch 'master' of https://github.com/apache/incubator-spark	KarthikTunga	2013-10-15	159	-1367/+5322
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updating local branch
\| * \| \| \| \| \|	SPARK-627 - reading --config argument	KarthikTunga	2013-10-15	2	-0/+33
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #67 from kayousterhout/remove_tsl	Matei Zaharia	2013-10-17	9	-105/+63
\|\ \ \ \ \ \ \ \| \|_\|_\|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function). @mateiz - is there a reason for this interface that I'm missing?
\| * \| \| \| \| \|	Fixed unit tests	Kay Ousterhout	2013-10-16	2	-25/+26
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Removed TaskSchedulerListener interface.	Kay Ousterhout	2013-10-16	7	-80/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function).
* \| \| \| \| \| \|	Merge pull request #65 from tgravescs/fixYarn	Matei Zaharia	2013-10-16	1	-2/+2
\|\ \ \ \ \ \ \ \| \|/ / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix yarn build Fix the yarn build after renaming StandAloneX to CoarseGrainedX from pull request 34.
\| * \| \| \| \| \|	Fix yarn build	tgravescs	2013-10-16	1	-2/+2
\|/ / / / / /
* \| \| \| \| \|	Merge pull request #63 from pwendell/master	Matei Zaharia	2013-10-15	2	-4/+10
\|\ \ \ \ \ \ \| \| \|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
\| * \| \| \| \|	Fixing spark streaming example and a bug in examples build.	Patrick Wendell	2013-10-15	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
* \| \| \| \| \|	Merge pull request #62 from harveyfeng/master	Matei Zaharia	2013-10-15	2	-2/+5
\|\ \ \ \ \ \ \| \|/ / / / / \|/\| \| \| \| \| \| \| \| \| \| \|	Make TaskContext's stageId publicly accessible.
\| * \| \| \| \|	Proper formatting for SparkHadoopWriter class extensions.	Harvey Feng	2013-10-15	1	-1/+3
\| \| \| \| \| \|
\| * \| \| \| \|	Fix line length > 100 chars in SparkHadoopWriter	Harvey Feng	2013-10-15	1	-1/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Make TaskContext's stageId publicly accessible.	Harvey Feng	2013-10-15	1	-1/+1
\| \| \| \| \| \|
* \| \| \| \| \|	Merge pull request #8 from vchekan/checkpoint-ttl-restore	Matei Zaharia	2013-10-15	2	-0/+6
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Serialize and restore spark.cleaner.ttl to savepoint In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.
\| * \| \| \| \| \|	Serialize and restore spark.cleaner.ttl to savepoint	Vadim Chekan	2013-09-20	2	-0/+6
\| \| \|_\|_\|/ / \| \|/\| \| \| \|
* \| \| \| \| \|	Merge pull request #34 from kayousterhout/rename	Matei Zaharia	2013-10-15	6	-36/+42
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Renamed StandaloneX to CoarseGrainedX. (as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14) The previous names were confusing because the components weren't just used in Standalone mode. The scheduler used for Standalone mode is called SparkDeploySchedulerBackend, so referring to the base class as StandaloneSchedulerBackend was misleading.
\| * \| \| \| \| \|	Fixed build error after merging in master	Kay Ousterhout	2013-10-15	1	-1/+1
\| \| \| \| \| \| \|