spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #240 from pwendell/master	Patrick Wendell	2013-12-07	1	-4/+4
\|\ \| \| \| \| \| \|	SPARK-917 Improve API links in nav bar
\| *	SPARK-917 Improve API links in nav bar	Patrick Wendell	2013-12-07	1	-4/+4
\| \|
* \|	Merge pull request #239 from aarondav/nit	Patrick Wendell	2013-12-07	1	-1/+1
\|\ \ \| \|/ \|/\| \| \|	Correct spellling error in configuration.md
\| *	Correct spellling error in configuration.md	Aaron Davidson	2013-12-07	1	-1/+1
\|/
*	Merge pull request #237 from pwendell/formatting-fix	Patrick Wendell	2013-12-06	1	-1/+0
\|\ \| \| \| \| \| \| \| \| \| \|	Formatting fix This is a single-line change. The diff appears larger here due to github being out of sync.
\| *	Minor formatting fix in config file	Patrick Wendell	2013-12-06	1	-1/+0
\|/
*	Merge pull request #236 from pwendell/shuffle-docs	Patrick Wendell	2013-12-06	1	-1/+1
\|\ \| \| \| \| \| \|	Adding disclaimer for shuffle file consolidation
\| *	Adding disclaimer for shuffle file consolidation	Patrick Wendell	2013-12-06	1	-1/+1
\| \|
* \|	Merge pull request #235 from pwendell/master	Patrick Wendell	2013-12-06	3	-3/+10
\|\ \ \| \|/ \|/\| \| \|	Minor doc fixes and updating README
\| *	Minor doc fixes and updating README	Patrick Wendell	2013-12-06	3	-3/+10
\|/
*	Merge pull request #234 from alig/master	Patrick Wendell	2013-12-06	4	-3/+17
\|\ \| \| \| \| \| \|	Updated documentation about the YARN v2.2 build process
\| *	more docs	Ali Ghodsi	2013-12-06	3	-3/+5
\| \|
\| *	Updated documentation about the YARN v2.2 build process	Ali Ghodsi	2013-12-06	3	-1/+13
\| \|
* \|	Merge pull request #190 from markhamstra/Stages4Jobs	Matei Zaharia	2013-12-06	9	-91/+280
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stageId <--> jobId mapping in DAGScheduler Okay, I think this one is ready to go -- or at least it's ready for review and discussion. It's a carry-over of https://github.com/mesos/spark/pull/842 with updates for the newer job cancellation functionality. The prior discussion still applies. I've actually changed the job cancellation flow a bit: Instead of ``cancelTasks`` going to the TaskScheduler and then ``taskSetFailed`` coming back to the DAGScheduler (resulting in ``abortStage`` there), the DAGScheduler now takes care of figuring out which stages should be cancelled, tells the TaskScheduler to cancel tasks for those stages, then does the cleanup within the DAGScheduler directly without the need for any further prompting by the TaskScheduler. I know of three outstanding issues, each of which can and should, I believe, be handled in follow-up pull requests: 1) https://spark-project.atlassian.net/browse/SPARK-960 2) JobLogger should be re-factored to eliminate duplication 3) Related to 2), the WebUI should also become a consumer of the DAGScheduler's new understanding of the relationship between jobs and stages so that it can display progress indication and the like grouped by job. Right now, some of this information is just being sent out as part of ``SparkListenerJobStart`` messages, but more or different job <--> stage information may need to be exported from the DAGScheduler to meet listeners needs. Except for the eventQueue -> Actor commit, the rest can be cherry-picked almost cleanly into branch-0.8. A little merging is needed in MapOutputTracker and the DAGScheduler. Merged versions of those files are in https://github.com/markhamstra/incubator-spark/tree/aba2b40ce04ee9b7b9ea260abb6f09e050142d43 Note that between the recent Actor change in the DAGScheduler and the cleaning up of DAGScheduler data structures on job completion in this PR, some races have been introduced into the DAGSchedulerSuite. Those tests usually pass, and I don't think that better-behaved code that doesn't directly inspect DAGScheduler data structures should be seeing any problems, but I'll work on fixing DAGSchedulerSuite as either an addition to this PR or as a separate request. UPDATE: Fixed the race that I introduced. Created a JIRA issue (SPARK-965) for the one that was introduced with the switch to eventProcessorActor in the DAGScheduler.
\| * \|	SparkListenerJobStart posted from local jobs	Mark Hamstra	2013-12-03	1	-0/+1
\| \| \|
\| * \|	Synchronous, inline cleanup after runLocally	Mark Hamstra	2013-12-03	3	-13/+6
\| \| \|
\| * \|	Local jobs post SparkListenerJobEnd, and DAGScheduler data structure	Mark Hamstra	2013-12-03	2	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \|	cleanup always occurs before any posting of SparkListenerJobEnd.
\| * \|	Tightly couple stageIdToJobIds and jobIdToStageIds	Mark Hamstra	2013-12-03	1	-17/+12
\| \| \|
\| * \|	Cleaned up job cancellation handling	Mark Hamstra	2013-12-03	1	-7/+5
\| \| \|
\| * \|	Refactoring to make job removal, stage removal, task cancellation clearer	Mark Hamstra	2013-12-03	1	-37/+39
\| \| \|
\| * \|	Improved comment	Mark Hamstra	2013-12-03	1	-4/+3
\| \| \|
\| * \|	Removed redundant residual re: reverted refactoring.	Mark Hamstra	2013-12-03	1	-1/+1
\| \| \|
\| * \|	Fixed intended side-effects	Mark Hamstra	2013-12-03	1	-2/+2
\| \| \|
\| * \|	Actor instead of eventQueue for LocalJobCompleted	Mark Hamstra	2013-12-03	1	-1/+1
\| \| \|
\| * \|	Added stageId <--> jobId mapping in DAGScheduler	Mark Hamstra	2013-12-03	9	-88/+286
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...and make sure that DAGScheduler data structures are cleaned up on job completion. Initial effort and discussion at https://github.com/mesos/spark/pull/842
* \| \|	Merge pull request #233 from hsaputra/changecontexttobackend	Matei Zaharia	2013-12-06	1	-2/+2
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the name of input argument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.
\| * \| \|	Change the name of input ragument in ClusterScheduler#initialize from ↵	Henry Saputra	2013-12-05	1	-2/+2
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.
* \| \|	Merge pull request #205 from kayousterhout/logging	Matei Zaharia	2013-12-06	1	-2/+34
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added logging of scheduler delays to UI This commit adds two metrics to the UI: 1) The time to get task results, if they're fetched remotely 2) The scheduler delay. When the scheduler starts getting overwhelmed (because it can't keep up with the rate at which tasks are being submitted), the result is that tasks get delayed on the tail-end: the message from the worker saying that the task has completed ends up in a long queue and takes a while to be processed by the scheduler. This commit records that delay in the UI so that users can tell when the scheduler is becoming the bottleneck.
\| * \| \|	Fixed problem with scheduler delay	Kay Ousterhout	2013-12-02	1	-4/+7
\| \| \| \|
\| * \| \|	Added logging of scheduler delays to UI	Kay Ousterhout	2013-11-21	1	-2/+31
\| \| \| \|
* \| \| \|	Merge pull request #220 from rxin/zippart	Matei Zaharia	2013-12-06	1	-16/+11
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion. This was a problem in GraphX where we have a whole chain of RDDs that are ZippedPartitionsRDD's, and the preferred locations were taking eternity to compute. (cherry picked from commit e36fe55a031d2c01c9d7c5d85965951c681a0c74) Signed-off-by: Reynold Xin <rxin@apache.org>
\| * \| \| \|	Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location ↵	Reynold Xin	2013-11-30	1	-16/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	computation doesn't lead to exponential explosion. (cherry picked from commit e36fe55a031d2c01c9d7c5d85965951c681a0c74) Signed-off-by: Reynold Xin <rxin@apache.org>
* \| \| \| \|	Merge pull request #232 from markhamstra/FiniteWait	Reynold Xin	2013-12-05	3	-1/+28
\|\ \ \ \ \ \| \|_\|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	jobWaiter.synchronized before jobWaiter.wait ...else ``IllegalMonitorStateException`` in ``SimpleFutureAction#ready``.
\| * \| \| \|	FutureAction result tests	Mark Hamstra	2013-12-05	1	-0/+26
\| \| \| \| \|
\| * \| \| \|	jobWaiter.synchronized before jobWaiter.wait	Mark Hamstra	2013-12-05	2	-1/+2
\|/ / / /
* \| \| \|	Merge pull request #228 from pwendell/master	Patrick Wendell	2013-12-05	3	-4/+49
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Document missing configs and set shuffle consolidation to false.
\| * \| \| \|	Forcing shuffle consolidation in DiskBlockManagerSuite	Patrick Wendell	2013-12-05	1	-2/+12
\| \| \| \| \|
\| * \| \| \|	Small changes from Matei review	Patrick Wendell	2013-12-04	1	-2/+2
\| \| \| \| \|
\| * \| \| \|	Document missing configs and set shuffle consolidation to false.	Patrick Wendell	2013-12-04	2	-2/+37
\| \| \| \| \|
* \| \| \| \|	Merge pull request #199 from harveyfeng/yarn-2.2	Matei Zaharia	2013-12-04	23	-343/+3716
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hadoop 2.2 migration Includes support for the YARN API stabilized in the Hadoop 2.2 release, and a few style patches. Short description for each set of commits: a98f5a0 - "Misc style changes in the 'yarn' package" a67ebf4 - "A few more style fixes in the 'yarn' package" Both of these are some minor style changes, such as fixing lines over 100 chars, to the existing YARN code. ab8652f - "Add a 'new-yarn' directory ... " Copies everything from `SPARK_HOME/yarn` to `SPARK_HOME/new-yarn`. No actual code changes here. 4f1c3fa - "Hadoop 2.2 YARN API migration ..." API patches to code in the `SPARK_HOME/new-yarn` directory. There are a few more small style changes mixed in, too. Based on @colorant's Hadoop 2.2 support for the scala-2.10 branch in #141. a1a1c62 - "Add optional Hadoop 2.2 settings in sbt build ... " If Spark should be built against Hadoop 2.2, then: a) the `org.apache.spark.deploy.yarn` package will be compiled from the `new-yarn` directory. b) Protobuf v2.5 will be used as a Spark dependency, since Hadoop 2.2 depends on it. Also, Spark will be built against a version of Akka v2.0.5 that's built against Protobuf 2.5, named `akka-2.0.5-protobuf-2.5`. The patched Akka is here: https://github.com/harveyfeng/akka/tree/2.0.5-protobuf-2.5, and was published to local Ivy during testing. There's also a new boolean environment variable, `SPARK_IS_NEW_HADOOP`, that users can manually set if their `SPARK_HADOOP_VERSION` specification does not start with `2.2`, which is how the build file tries to detect a 2.2 version. Not sure if this is necessary or done in the best way, though...
\| * \ \ \ \	Merge pull request #2 from colorant/yarn-client-2.2	Harvey Feng	2013-12-03	3	-24/+56
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix pom.xml for maven build
\| \| * \| \| \| \|	Fix pom.xml for maven build	Raymond Liu	2013-12-03	3	-24/+56
\| \|/ / / / /
\| * \| \| \| \|	Use published "org.spark-project.akka-*" in sbt build for Hadoop-2.2 ↵	Harvey Feng	2013-12-03	1	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dependencies. This also includes: -Change `isNewYarn` to `isNewHadoop`, since the protobuf-2.5 dependency is from Hadoop-2.2 itself. -Regexp bugix Credits to @alig for this patch.
\| * \| \| \| \|	Merge pull request #1 from colorant/yarn-client-2.2	Harvey Feng	2013-11-27	5	-17/+405
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Port yarn-client mode for new-yarn
\| \| * \| \| \| \|	Port yarn-client mode for new-yarn	Raymond Liu	2013-11-27	5	-17/+405
\| \|/ / / / /
\| * \| \| \| \|	Merge remote-tracking branch 'origin/master' into yarn-2.2	Harvey Feng	2013-11-26	35	-218/+1466
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
\| * \| \| \| \| \|	Add optional Hadoop 2.2 settings in sbt build.	Harvey Feng	2013-11-26	1	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the Hadoop used is version 2.2 or derived from it, then Spark will be compiled against protobuf-2.5 and a protobuf-2.5 version of Akka 2.0.5.
\| * \| \| \| \| \|	Hadoop 2.2 YARN API migration for `SPARK_HOME/new-yarn`	Harvey Feng	2013-11-23	6	-489/+468
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Add a "new-yarn" directory in SPARK_HOME, intended to contain Hadoop-2.2 API ↵	Harvey Feng	2013-11-23	11	-0/+2822
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	changes.
\| * \| \| \| \| \|	A few more style fixes in `yarn` package.	Harvey Feng	2013-11-23	3	-45/+71
\| \| \| \| \| \| \|