spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #65 from tgravescs/fixYarn	Matei Zaharia	2013-10-16	1	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \|	Fix yarn build Fix the yarn build after renaming StandAloneX to CoarseGrainedX from pull request 34.
\| *	Fix yarn build	tgravescs	2013-10-16	1	-2/+2
\|/
*	Merge pull request #63 from pwendell/master	Matei Zaharia	2013-10-15	2	-4/+10
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
\| *	Fixing spark streaming example and a bug in examples build.	Patrick Wendell	2013-10-15	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \|	- Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
* \|	Merge pull request #62 from harveyfeng/master	Matei Zaharia	2013-10-15	2	-2/+5
\|\ \ \| \|/ \|/\| \| \|	Make TaskContext's stageId publicly accessible.
\| *	Proper formatting for SparkHadoopWriter class extensions.	Harvey Feng	2013-10-15	1	-1/+3
\| \|
\| *	Fix line length > 100 chars in SparkHadoopWriter	Harvey Feng	2013-10-15	1	-1/+2
\| \|
\| *	Make TaskContext's stageId publicly accessible.	Harvey Feng	2013-10-15	1	-1/+1
\| \|
* \|	Merge pull request #8 from vchekan/checkpoint-ttl-restore	Matei Zaharia	2013-10-15	2	-0/+6
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Serialize and restore spark.cleaner.ttl to savepoint In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.
\| * \|	Serialize and restore spark.cleaner.ttl to savepoint	Vadim Chekan	2013-09-20	2	-0/+6
\| \| \|
* \| \|	Merge pull request #34 from kayousterhout/rename	Matei Zaharia	2013-10-15	6	-36/+42
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Renamed StandaloneX to CoarseGrainedX. (as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14) The previous names were confusing because the components weren't just used in Standalone mode. The scheduler used for Standalone mode is called SparkDeploySchedulerBackend, so referring to the base class as StandaloneSchedulerBackend was misleading.
\| * \| \|	Fixed build error after merging in master	Kay Ousterhout	2013-10-15	1	-1/+1
\| \| \| \|
\| * \| \|	Merge remote branch 'upstream/master' into rename	Kay Ousterhout	2013-10-15	175	-1414/+5573
\| \|\ \ \ \| \| \| \|/ \| \| \|/\|
\| * \| \|	Added back fully qualified class name	Kay Ousterhout	2013-10-06	1	-1/+1
\| \| \| \|
\| * \| \|	Renamed StandaloneX to CoarseGrainedX.	Kay Ousterhout	2013-10-04	6	-35/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous names were confusing because the components weren't just used in Standalone mode -- in fact, the scheduler used for Standalone mode is called SparkDeploySchedulerBackend. So, the previous names were misleading.
* \| \| \|	Merge pull request #61 from kayousterhout/daemon_thread	Matei Zaharia	2013-10-15	7	-38/+29
\|\ \ \ \ \| \|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unified daemon thread pools As requested by @mateiz in an earlier pull request, this refactors various daemon thread pools to use a set of methods in utils.scala, and also changes the thread-pool-creation methods in utils.scala to use named thread pools for improved debugging.
\| * \| \|	Unified daemon thread pools	Kay Ousterhout	2013-10-15	7	-38/+29
\|/ / /
* \| \|	Merge pull request #59 from rxin/warning	Matei Zaharia	2013-10-15	1	-5/+5
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \|	Bump up logging level to warning for failed tasks.
\| * \| \|	Bump up logging level to warning for failed tasks.	Reynold Xin	2013-10-14	1	-5/+5
\| \| \| \|
* \| \| \|	Merge pull request #58 from hsaputra/update-pom-asf	Reynold Xin	2013-10-15	1	-1/+24
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update pom.xml to use version 13 of the ASF parent pom Update pom.xml to use version 13 of the ASF parent pom. Add mailingList element to pom.xml.
\| * \| \|	Update pom.xml to use version 13 of the ASF parent pom and add mailingLists ↵	Henry Saputra	2013-10-14	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	element.
* \| \| \|	Merge pull request #29 from rxin/kill	Patrick Wendell	2013-10-14	50	-515/+1528
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Job killing Moving https://github.com/mesos/spark/pull/935 here The high level idea is to have an "interrupted" field in TaskContext, and a task should check that flag to determine if its execution should continue. For convenience, I provide an InterruptibleIterator which wraps around a normal iterator but checks for the interrupted flag. I also provide an InterruptibleRDD that wraps around an existing RDD. As part of this pull request, I added an AsyncRDDActions class that provides a number of RDD actions that return a FutureJob (extending scala.concurrent.Future). The FutureJob can be used to kill the job execution, or waits until the job finishes. This is NOT ready for merging yet. Remaining TODOs: 1. Add unit tests 2. Add job killing functionality for local scheduler (current job killing functionality only works in cluster scheduler) ------------- Update on Oct 10, 2013: This is ready! Related future work: - Figure out how to handle the job triggered by RangePartitioner (this one is tough; might become future work) - Java API - Python API
\| * \| \|	Merge branch 'master' of github.com:apache/incubator-spark into kill	Reynold Xin	2013-10-14	53	-457/+652
\| \|\ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
* \| \| \|	Merge pull request #57 from aarondav/bid	Reynold Xin	2013-10-14	44	-385/+544
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor BlockId into an actual type Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.
\| * \| \| \|	Address Matei's comments	Aaron Davidson	2013-10-14	8	-34/+28
\| \| \| \| \|
\| * \| \| \|	Change BlockId filename to name + rest of Patrick's comments	Aaron Davidson	2013-10-13	11	-36/+39
\| \| \| \| \|
\| * \| \| \|	Add unit test and address rest of Reynold's comments	Aaron Davidson	2013-10-12	10	-20/+144
\| \| \| \| \|
\| * \| \| \|	Refactor BlockId into an actual type	Aaron Davidson	2013-10-12	43	-385/+423
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an unfortunately invasive change which converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Since this touches a lot of files, it'd be best to either get this patch in quickly or throw it on the ground to avoid too many secondary merge conflicts.
* \| \| \| \|	Merge pull request #52 from harveyfeng/hadoop-closure	Reynold Xin	2013-10-12	2	-55/+26
\|\ \ \ \ \ \| \|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an optional closure parameter to HadoopRDD instantiation to use when creating local JobConfs. Having HadoopRDD accept this optional closure eliminates the need for the HadoopFileRDD added earlier. It makes the HadoopRDD more general, in that the caller can specify any JobConf initialization flow.
\| * \| \| \|	Remove the new HadoopRDD constructor from SparkContext API, plus some minor ↵	Harvey Feng	2013-10-12	2	-27/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	style changes.
\| * \| \| \|	Add an optional closure parameter to HadoopRDD instantiation to used when ↵	Harvey Feng	2013-10-10	2	-53/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	creating any local JobConfs.
* \| \| \| \|	Merge pull request #54 from aoiwelle/remove_unused_imports	Reynold Xin	2013-10-11	1	-2/+0
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unnecessary mutable imports It appears that the imports aren't necessary here.
\| * \| \| \| \|	Remove unnecessary mutable imports	Neal Wiggins	2013-10-11	1	-2/+0
\| \| \| \| \| \|
* \| \| \| \| \|	Merge pull request #53 from witgo/master	Matei Zaharia	2013-10-11	1	-0/+4
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a zookeeper compile dependency to fix build in maven Add a zookeeper compile dependency to fix build in maven
\| * \| \| \| \| \|	Add a zookeeper compile dependency to fix build in maven	LiGuoqiang	2013-10-11	1	-0/+4
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #32 from mridulm/master	Matei Zaharia	2013-10-11	12	-29/+93
\|\ \ \ \ \ \ \ \| \|/ / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Address review comments, move to incubator spark Also includes a small fix to speculative execution. <edit> Continued from https://github.com/mesos/spark/pull/914 </edit>
\| * \| \| \| \| \|	- Allow for finer control of cleaner	Mridul Muralidharan	2013-10-06	12	-29/+93
\| \| \|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Address review comments, move to incubator spark - Also includes a change to speculation - including preventing exceptions in rare cases.
\| \| \| \| * \|	Fixed PairRDDFunctionsSuite after removing InterruptibleRDD.	Reynold Xin	2013-10-12	1	-1/+1
\| \| \| \| \| \|
\| \| \| \| * \|	Job cancellation: address Matei's code review feedback.	Reynold Xin	2013-10-12	17	-216/+248
\| \| \| \| \| \|
\| \| \| \| * \|	Job cancellation: addressed code review feedback round 2 from Kay.	Reynold Xin	2013-10-11	3	-44/+47
\| \| \| \| \| \|
\| \| \| \| * \|	Fixed dagscheduler suite because of a logging message change.	Reynold Xin	2013-10-11	1	-1/+1
\| \| \| \| \| \|
\| \| \| \| * \|	Job cancellation: addressed code review feedback from Kay.	Reynold Xin	2013-10-11	12	-80/+86
\| \| \| \| \| \|
\| \| \| \| * \|	Making takeAsync and collectAsync deterministic.	Reynold Xin	2013-10-11	3	-19/+15
\| \| \| \| \| \|
\| \| \| \| * \|	Properly handle interrupted exception in FutureAction.	Reynold Xin	2013-10-11	1	-7/+5
\| \| \| \| \| \|
\| \| \| \| * \|	Merge branch 'master' of github.com:apache/incubator-spark into kill	Reynold Xin	2013-10-10	55	-264/+2641
\| \| \| \| \|\ \ \| \|_\|_\|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
* \| \| \| \| \|	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache'	Matei Zaharia	2013-10-10	6	-49/+275
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
\| * \| \| \| \| \|	Adding in the --addJars option to make SparkContext.addJar work on yarn and ↵	tgravescs	2013-10-03	5	-23/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup the classpaths
\| * \| \| \| \| \|	Support distributed cache files and archives on spark on yarn and attempt to ↵	Y.CORP.YAHOO.COM\tgraves	2013-09-23	5	-30/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup the staging directory on exit
* \| \| \| \| \| \|	Merge pull request #19 from aarondav/master-zk	Matei Zaharia	2013-10-10	45	-175/+1947
\|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
\| * \| \| \| \| \| \|	Minor clarification and cleanup to spark-standalone.md	Aaron Davidson	2013-10-10	1	-10/+33
\| \| \| \| \| \| \| \|