spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| * \| \| \| \| \| \| \|	spark-898, changes according to review comments	wangda.tan	2013-12-17	6	-42/+79
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	Merge branch 'master' of git://github.com/apache/incubator-spark	wangda.tan	2013-12-16	91	-522/+661
\| \|\ \ \ \ \ \ \ \
\| * \ \ \ \ \ \ \ \	Merge branch 'master' of https://github.com/leftnoteasy/incubator-spark-1	wangda.tan	2013-12-09	15	-240/+434
\| \|\ \ \ \ \ \ \ \ \ \| \| \| \|_\|_\|_\|_\|_\|_\|/ \| \| \|/\| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \|	SPARK-968, In stage UI, add an overview section that shows task stats ↵	wangda.tan	2013-12-09	4	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	grouped by executor id
* \| \| \| \| \| \| \| \| \|	Merge pull request #280 from aarondav/minor	Patrick Wendell	2013-12-20	3	-17/+8
\|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor cleanup for standalone scheduler See commit messages
\| * \| \| \| \| \| \| \| \| \|	Fix compiler warning in SparkZooKeeperSession	Aaron Davidson	2013-12-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Remove firstApp from the standalone scheduler Master	Aaron Davidson	2013-12-19	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a lonely child with no one to care for it... we had to put it down.
\| * \| \| \| \| \| \| \| \| \|	Extraordinarily minor code/comment cleanup	Aaron Davidson	2013-12-19	2	-7/+7
\| \| \|_\|_\|_\|_\|/ / / / \| \|/\| \| \| \| \| \| \| \|
* \| \| \| \| \| \| \| \| \|	Merge pull request #272 from tmyklebu/master	Patrick Wendell	2013-12-19	4	-15/+34
\|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.
\| * \| \| \| \| \| \| \| \| \|	Add a serialisation time column to the StagePage.	Tor Myklebust	2013-12-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Missed a spot; had an objectSer here too.	Tor Myklebust	2013-12-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Merge branch 'master' of git://github.com/apache/incubator-spark	Tor Myklebust	2013-12-16	1	-3/+3
\| \|\ \ \ \ \ \ \ \ \ \
\| * \| \| \| \| \| \| \| \| \| \|	Incorporate pwendell's code review suggestions.	Tor Myklebust	2013-12-16	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	UI to display serialisation time of a stage.	Tor Myklebust	2013-12-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Track task value serialisation time in TaskMetrics.	Tor Myklebust	2013-12-16	3	-14/+23
\| \| \|_\|_\|_\|/ / / / / / \| \|/\| \| \| \| \| \| \| \| \|
* \| \| \| \| \| \| \| \| \| \|	Merge pull request #276 from shivaram/collectPartition	Reynold Xin	2013-12-19	2	-4/+11
\|\ \ \ \ \ \ \ \ \ \ \ \| \|_\|_\|/ / / / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
\| * \| \| \| \| \| \| \| \| \|	Add comment explaining collectPartitions's use	Shivaram Venkataraman	2013-12-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Make collectPartitions take an array of partitions	Shivaram Venkataraman	2013-12-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
\| * \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface.	Shivaram Venkataraman	2013-12-18	2	-5/+10
\| \| \|/ / / / / / / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
* \| \| \| \| \| \| \| \| \|	Add toString to Java RDD, and __repr__ to Python RDD	Nick Pentreath	2013-12-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \|
* \| \| \| \| \| \| \| \| \|	In experimental clusters we've observed that a 10 second timeout was ↵	Aaron Davidson	2013-12-18	13	-71/+48
\| \|_\|_\|_\|_\|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.
* \| \| \| \| \| \| \| \|	Fixed a performance problem in RDD.top and BoundedPriorityQueue (size in ↵	Reynold Xin	2013-12-17	1	-0/+2
\|/ / / / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion).
* \| \| \| \| \| \| \|	Merge pull request #245 from gregakespret/task-maxfailures-fix	Reynold Xin	2013-12-16	1	-3/+3
\|\ \ \ \ \ \ \ \ \| \|/ / / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix for spark.task.maxFailures not enforced correctly. Docs at http://spark.incubator.apache.org/docs/latest/configuration.html say: ``` spark.task.maxFailures Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1. ``` Previous implementation worked incorrectly. When for example `spark.task.maxFailures` was set to 1, the job was aborted only after the second task failure, not after the first one.
\| * \| \| \| \| \| \|	Fix for spark.task.maxFailures not enforced correctly.	Grega Kespret	2013-12-09	1	-3/+3
\| \| \|/ / / / / \| \|/\| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #249 from ngbinh/partitionInJavaSortByKey	Josh Rosen	2013-12-14	1	-0/+14
\|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expose numPartitions parameter in JavaPairRDD.sortByKey() This change makes Java and Scala API on sortByKey() the same.
\| * \| \| \| \| \| \|	Hook directly to Scala API	Binh Nguyen	2013-12-10	1	-8/+6
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Leave default value of numPartitions to Scala code.	Binh Nguyen	2013-12-10	1	-2/+8
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Use braces to shorten the line.	Binh Nguyen	2013-12-10	1	-1/+3
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Expose numPartitions parameter in JavaPairRDD.sortByKey()	Binh Nguyen	2013-12-10	1	-2/+10
\| \| \|_\|/ / / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change make Java and Scala API on sortByKey() the same.
* \| \| \| \| \| \|	Added a comment about ActorRef and ActorSelection difference.	Prashant Sharma	2013-12-14	1	-0/+7
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Review comments on the PR for scala 2.10 migration.	Prashant Sharma	2013-12-13	13	-45/+26
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	16	-269/+529
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| * \| \| \| \| \|	License headers	Patrick Wendell	2013-12-09	1	-0/+17
\| \|/ / / / /
\| * \| \| \| \|	Merge pull request #190 from markhamstra/Stages4Jobs	Matei Zaharia	2013-12-06	7	-80/+244
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stageId <--> jobId mapping in DAGScheduler Okay, I think this one is ready to go -- or at least it's ready for review and discussion. It's a carry-over of https://github.com/mesos/spark/pull/842 with updates for the newer job cancellation functionality. The prior discussion still applies. I've actually changed the job cancellation flow a bit: Instead of ``cancelTasks`` going to the TaskScheduler and then ``taskSetFailed`` coming back to the DAGScheduler (resulting in ``abortStage`` there), the DAGScheduler now takes care of figuring out which stages should be cancelled, tells the TaskScheduler to cancel tasks for those stages, then does the cleanup within the DAGScheduler directly without the need for any further prompting by the TaskScheduler. I know of three outstanding issues, each of which can and should, I believe, be handled in follow-up pull requests: 1) https://spark-project.atlassian.net/browse/SPARK-960 2) JobLogger should be re-factored to eliminate duplication 3) Related to 2), the WebUI should also become a consumer of the DAGScheduler's new understanding of the relationship between jobs and stages so that it can display progress indication and the like grouped by job. Right now, some of this information is just being sent out as part of ``SparkListenerJobStart`` messages, but more or different job <--> stage information may need to be exported from the DAGScheduler to meet listeners needs. Except for the eventQueue -> Actor commit, the rest can be cherry-picked almost cleanly into branch-0.8. A little merging is needed in MapOutputTracker and the DAGScheduler. Merged versions of those files are in https://github.com/markhamstra/incubator-spark/tree/aba2b40ce04ee9b7b9ea260abb6f09e050142d43 Note that between the recent Actor change in the DAGScheduler and the cleaning up of DAGScheduler data structures on job completion in this PR, some races have been introduced into the DAGSchedulerSuite. Those tests usually pass, and I don't think that better-behaved code that doesn't directly inspect DAGScheduler data structures should be seeing any problems, but I'll work on fixing DAGSchedulerSuite as either an addition to this PR or as a separate request. UPDATE: Fixed the race that I introduced. Created a JIRA issue (SPARK-965) for the one that was introduced with the switch to eventProcessorActor in the DAGScheduler.
\| \| * \| \| \| \|	SparkListenerJobStart posted from local jobs	Mark Hamstra	2013-12-03	1	-0/+1
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Synchronous, inline cleanup after runLocally	Mark Hamstra	2013-12-03	2	-11/+6
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Local jobs post SparkListenerJobEnd, and DAGScheduler data structure	Mark Hamstra	2013-12-03	2	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup always occurs before any posting of SparkListenerJobEnd.
\| \| * \| \| \| \|	Tightly couple stageIdToJobIds and jobIdToStageIds	Mark Hamstra	2013-12-03	1	-17/+12
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Cleaned up job cancellation handling	Mark Hamstra	2013-12-03	1	-7/+5
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Refactoring to make job removal, stage removal, task cancellation clearer	Mark Hamstra	2013-12-03	1	-37/+39
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Improved comment	Mark Hamstra	2013-12-03	1	-4/+3
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Removed redundant residual re: reverted refactoring.	Mark Hamstra	2013-12-03	1	-1/+1
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Fixed intended side-effects	Mark Hamstra	2013-12-03	1	-2/+2
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Actor instead of eventQueue for LocalJobCompleted	Mark Hamstra	2013-12-03	1	-1/+1
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Added stageId <--> jobId mapping in DAGScheduler	Mark Hamstra	2013-12-03	7	-77/+248
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...and make sure that DAGScheduler data structures are cleaned up on job completion. Initial effort and discussion at https://github.com/mesos/spark/pull/842
\| * \| \| \| \| \|	Merge pull request #233 from hsaputra/changecontexttobackend	Matei Zaharia	2013-12-06	1	-2/+2
\| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the name of input argument in ClusterScheduler#initialize from context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.
\| \| * \| \| \| \| \|	Change the name of input ragument in ClusterScheduler#initialize from ↵	Henry Saputra	2013-12-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	context to backend. The SchedulerBackend used to be called ClusterSchedulerContext so just want to make small change of the input param in the ClusterScheduler#initialize to reflect this.
\| * \| \| \| \| \| \|	Merge pull request #205 from kayousterhout/logging	Matei Zaharia	2013-12-06	1	-2/+34
\| \|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added logging of scheduler delays to UI This commit adds two metrics to the UI: 1) The time to get task results, if they're fetched remotely 2) The scheduler delay. When the scheduler starts getting overwhelmed (because it can't keep up with the rate at which tasks are being submitted), the result is that tasks get delayed on the tail-end: the message from the worker saying that the task has completed ends up in a long queue and takes a while to be processed by the scheduler. This commit records that delay in the UI so that users can tell when the scheduler is becoming the bottleneck.
\| \| * \| \| \| \| \| \|	Fixed problem with scheduler delay	Kay Ousterhout	2013-12-02	1	-4/+7
\| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \|	Added logging of scheduler delays to UI	Kay Ousterhout	2013-11-21	1	-2/+31
\| \| \| \| \| \| \| \| \|