spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Renamed 'priority' to 'jobId' and assorted minor changes	Mark Hamstra	2013-08-20	5	-59/+60
\|
*	Merge pull request #828 from mateiz/sched-improvements	Matei Zaharia	2013-08-19	41	-965/+1034
\|\ \| \| \| \|	Scheduler fixes and improvements
\| *	Added unit tests for ClusterTaskSetManager, and fix a bug found with	Matei Zaharia	2013-08-18	11	-28/+396
\| \| \| \| \| \| \| \|	resetting locality level after a non-local launch
\| *	Added some comments on threading in scheduler code	Matei Zaharia	2013-08-18	3	-6/+35
\| \|
\| *	Address some review comments:	Matei Zaharia	2013-08-18	6	-21/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options
\| *	Comment cleanup (via Kay) and some debug messages	Matei Zaharia	2013-08-18	4	-23/+16
\| \|
\| *	More scheduling fixes:	Matei Zaharia	2013-08-18	11	-190/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort
\| *	Initial work towards scheduler refactoring:	Matei Zaharia	2013-08-18	27	-751/+484
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".
* \|	Merge pull request #849 from mateiz/web-fixes	Matei Zaharia	2013-08-19	2	-8/+9
\|\ \ \| \| \| \| \| \|	Small fixes to web UI
\| * \|	Allow some wiggle room in UISuite port test and in EC2 ports	Matei Zaharia	2013-08-19	1	-2/+3
\| \| \|
\| * \|	Small fixes to web UI:	Matei Zaharia	2013-08-19	2	-6/+6
\| \|/ \| \| \| \| \| \| \| \| \| \|	- Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too
* \|	Merge pull request #847 from rxin/rdd	Matei Zaharia	2013-08-19	21	-189/+349
\|\ \ \| \|/ \|/\|	Allow subclasses of Product2 in all key-value related classes
\| *	Code review feedback. (added tests for cogroup and substract; added more ↵	Reynold Xin	2013-08-19	3	-11/+51
\| \| \| \| \| \| \| \|	documentation on MutablePair)
\| *	Added a test for sorting using MutablePair's.	Reynold Xin	2013-08-19	1	-2/+18
\| \|
\| *	Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle ↵	Reynold Xin	2013-08-19	19	-91/+132
\| \| \| \| \| \| \| \|	code path working with general Product2.
\| *	Added the missing RDD files and cleaned up SparkContext.	Reynold Xin	2013-08-18	4	-12/+126
\| \|
\| *	Allow subclasses of Product2 in all key-value related classes ↵	Reynold Xin	2013-08-18	10	-107/+56
\| \| \| \| \| \| \| \|	(ShuffleDependency, PairRDDFunctions, etc).
* \|	Merge pull request #840 from AndreSchumacher/zipegg	Matei Zaharia	2013-08-18	1	-1/+8
\|\ \ \| \|/ \|/\|	Implementing SPARK-878 for PySpark: adding zip and egg files to context ...
\| *	Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵	Andre Schumacher	2013-08-16	1	-1/+8
\| \| \| \| \| \| \| \|	passing it down to workers which add these to their sys.path
* \|	Moved shuffle serializer setting from a constructor parameter to a ↵	Reynold Xin	2013-08-17	5	-32/+51
\| \| \| \| \| \| \| \|	setSerializer method in various RDDs that involve shuffle operations.
* \|	Removed the mapSideCombine option in partitionBy.	Reynold Xin	2013-08-17	2	-28/+6
\| \|
* \|	Removed the mapSideCombine option in CoGroupedRDD.	Reynold Xin	2013-08-17	1	-33/+5
\| \|
* \|	Removed the unused shuffleId in ShuffleDependency's constructor.	Reynold Xin	2013-08-16	1	-1/+0
\| \|
* \|	Merge pull request #839 from jegonzal/zip_partitions	Matei Zaharia	2013-08-16	4	-17/+14
\|\ \ \| \| \| \| \| \|	Currying RDD.zipPartitions
\| * \|	Reversing the argument order in zipPartitions to enable stronger type inference.	Joseph E. Gonzalez	2013-08-16	4	-17/+14
\| \| \|
* \| \|	Use the JSON formatter from Scala library and removed dependency on lift-json.	Reynold Xin	2013-08-15	6	-70/+64
\| \| \| \| \| \| \| \| \| \| \| \|	It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
* \| \|	Revert "Merge pull request #834 from Daemoen/master"	Reynold Xin	2013-08-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 230ab2722ebd399afcf64c1a131f4929f602177d, reversing changes made to 659553b21ddd7504889ce113a816c1db4a73f167.
* \| \|	Merge pull request #834 from Daemoen/master	Reynold Xin	2013-08-15	1	-1/+2
\|\ \ \ \| \|_\|/ \|/\| \|	Updated json output to allow for display of worker state
\| * \|	Updated json output to allow for display of worker state	Daemoen	2013-08-15	1	-1/+2
\| \| \| \| \| \| \| \| \|	Ops teams need to ensure that the cluster is functional and performant. Having to scrape the html source for worker state won't work reliably, and will be slow. By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.
* \| \|	Merge pull request #836 from pwendell/rename	Patrick Wendell	2013-08-15	19	-64/+64
\|\ \ \ \| \|_\|/ \|/\| \|	Rename `memoryBytesToString` and `memoryMegabytesToString`
\| * \|	Rename `memoryBytesToString` and `memoryMegabytesToString`	Patrick Wendell	2013-08-15	19	-64/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are used all over the place now and they are not specific to memory at all. memoryBytesToString --> bytesToString memoryMegabytesToString --> megabytesToString
* \| \|	More minor UI changes including code review feedback.	Reynold Xin	2013-08-15	6	-16/+39
\| \| \|
* \| \|	Various UI improvements.	Reynold Xin	2013-08-14	12	-88/+83
\| \|/ \|/\|
* \|	Renamed setCurrentJobDescription to setJobDescription.	Reynold Xin	2013-08-14	1	-1/+1
\| \|
* \|	A few small scheduler / job description changes.	Reynold Xin	2013-08-14	4	-70/+74
\|/ \| \| \| \| \| \| \|	1. Renamed SparkContext.addLocalProperty to setLocalProperty. And allow this function to unset a property. 2. Renamed SparkContext.setDescription to setCurrentJobDescription. 3. Throw an exception if the fair scheduler allocation file is invalid.
*	Merge pull request #822 from pwendell/ui-features	Matei Zaharia	2013-08-14	6	-27/+54
\|\ \| \| \| \|	Adding GC Stats to TaskMetrics (and three small fixes)
\| *	Style cleanup based on Matei feedback	Patrick Wendell	2013-08-14	3	-5/+4
\| \|
\| *	Small style clean-up	Patrick Wendell	2013-08-13	2	-2/+2
\| \|
\| *	Correcting terminology in RDD page	Patrick Wendell	2013-08-13	1	-1/+1
\| \|
\| *	Correct sorting order for stages	Patrick Wendell	2013-08-13	2	-10/+6
\| \|
\| *	Capturing GC detials in TaskMetrics	Patrick Wendell	2013-08-13	4	-10/+37
\| \|
\| *	Bug fix for display of shuffle read/write metrics.	Patrick Wendell	2013-08-13	1	-6/+11
\| \| \| \| \| \| \| \| \| \|	This fixes an error where empty cells are missing if a given task has no shuffle read/write.
* \|	Fixed 2 bugs in executor UI.	Kay Ousterhout	2013-08-13	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \|	1) UI crashed if the executor UI was loaded before any tasks started. 2) The total tasks was incorrectly reported due to using string (rather than int) arithmetic.
* \|	Merge pull request #821 from pwendell/print-launch-command	Matei Zaharia	2013-08-13	1	-1/+1
\|\ \ \| \| \| \| \| \|	Print run command to stderr rather than stdout
\| * \|	Print run command to stderr rather than stdout	Patrick Wendell	2013-08-13	1	-1/+1
\| \| \|
* \| \|	Reuse the set of failed states rather than creating a new object each time	Kay Ousterhout	2013-08-13	1	-1/+3
\| \| \|
* \| \|	Properly account for killed tasks.	Kay Ousterhout	2013-08-13	1	-1/+1
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \|	The TaskState class's isFinished() method didn't return true for KILLED tasks, which means some resources are never reclaimed for tasks that are killed. This also made it inconsistent with the isFinished() method used by CoarseMesosSchedulerBackend.
* \|	Slight change to pr-784	Patrick Wendell	2013-08-13	5	-9/+10
\| \|
* \|	Merge pull request #784 from jerryshao/dev-metrics-servlet	Patrick Wendell	2013-08-13	14	-35/+157
\|\ \ \| \| \| \| \| \|	Add MetricsServlet for Spark metrics system
\| * \|	MetricsServlet code refactor according to comments	jerryshao	2013-08-12	11	-43/+35
\| \| \|