spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Removed meaningless types	Mark Hamstra	2013-08-20	1	-1/+1
\|
*	Merge remote-tracking branch 'jey/hadoop-agnostic'	Matei Zaharia	2013-08-20	29	-2255/+178
\|\ \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala
\| *	Fix Maven build with Hadoop 0.23.9	Jey Kottalam	2013-08-18	1	-0/+8
\| \|
\| *	Maven build now also works with YARN	Jey Kottalam	2013-08-16	1	-70/+0
\| \|
\| *	Don't mark hadoop-client as 'provided'	Jey Kottalam	2013-08-16	1	-1/+0
\| \|
\| *	Maven build now works with CDH hadoop-2.0.0-mr1	Jey Kottalam	2013-08-16	1	-52/+0
\| \|
\| *	Initial changes to make Maven build agnostic of hadoop version	Jey Kottalam	2013-08-16	1	-58/+5
\| \|
\| *	Rename HadoopWriter to SparkHadoopWriter since it's outside of our package	Jey Kottalam	2013-08-15	2	-6/+6
\| \|
\| *	Fix newTaskAttemptID to work under YARN	Jey Kottalam	2013-08-15	1	-1/+19
\| \|
\| *	re-enable YARN support	Jey Kottalam	2013-08-15	1	-1/+13
\| \|
\| *	SparkEnv isn't available this early, and not needed anyway	Jey Kottalam	2013-08-15	2	-25/+0
\| \|
\| *	make SparkHadoopUtil a member of SparkEnv	Jey Kottalam	2013-08-15	8	-26/+31
\| \|
\| *	rename HadoopMapRedUtil => SparkHadoopMapRedUtil, HadoopMapReduceUtil => ↵	Jey Kottalam	2013-08-15	5	-6/+7
\| \| \| \| \| \| \| \|	SparkHadoopMapReduceUtil
\| *	add comment	Jey Kottalam	2013-08-15	1	-4/+4
\| \|
\| *	dynamically detect hadoop version	Jey Kottalam	2013-08-15	2	-8/+48
\| \|
\| *	remove core/src/hadoop{1,2} dirs	Jey Kottalam	2013-08-15	6	-104/+0
\| \|
\| *	move yarn to its own directory	Jey Kottalam	2013-08-15	10	-1864/+0
\| \|
* \|	changeGeneration --> changeEpoch renaming	Mark Hamstra	2013-08-20	1	-2/+2
\| \|
* \|	Renamed 'priority' to 'jobId' and assorted minor changes	Mark Hamstra	2013-08-20	5	-59/+60
\| \|
* \|	Merge pull request #828 from mateiz/sched-improvements	Matei Zaharia	2013-08-19	41	-965/+1034
\|\ \ \| \| \| \| \| \|	Scheduler fixes and improvements
\| * \|	Added unit tests for ClusterTaskSetManager, and fix a bug found with	Matei Zaharia	2013-08-18	11	-28/+396
\| \| \| \| \| \| \| \| \| \| \| \|	resetting locality level after a non-local launch
\| * \|	Added some comments on threading in scheduler code	Matei Zaharia	2013-08-18	3	-6/+35
\| \| \|
\| * \|	Address some review comments:	Matei Zaharia	2013-08-18	6	-21/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options
\| * \|	Comment cleanup (via Kay) and some debug messages	Matei Zaharia	2013-08-18	4	-23/+16
\| \| \|
\| * \|	More scheduling fixes:	Matei Zaharia	2013-08-18	11	-190/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort
\| * \|	Initial work towards scheduler refactoring:	Matei Zaharia	2013-08-18	27	-751/+484
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".
* \| \|	Merge pull request #849 from mateiz/web-fixes	Matei Zaharia	2013-08-19	2	-8/+9
\|\ \ \ \| \| \| \| \| \| \| \|	Small fixes to web UI
\| * \| \|	Allow some wiggle room in UISuite port test and in EC2 ports	Matei Zaharia	2013-08-19	1	-2/+3
\| \| \| \|
\| * \| \|	Small fixes to web UI:	Matei Zaharia	2013-08-19	2	-6/+6
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too
* \| \|	Merge pull request #847 from rxin/rdd	Matei Zaharia	2013-08-19	21	-189/+349
\|\ \ \ \| \|/ / \|/\| \|	Allow subclasses of Product2 in all key-value related classes
\| * \|	Code review feedback. (added tests for cogroup and substract; added more ↵	Reynold Xin	2013-08-19	3	-11/+51
\| \| \| \| \| \| \| \| \| \| \| \|	documentation on MutablePair)
\| * \|	Added a test for sorting using MutablePair's.	Reynold Xin	2013-08-19	1	-2/+18
\| \| \|
\| * \|	Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle ↵	Reynold Xin	2013-08-19	19	-91/+132
\| \| \| \| \| \| \| \| \| \| \| \|	code path working with general Product2.
\| * \|	Added the missing RDD files and cleaned up SparkContext.	Reynold Xin	2013-08-18	4	-12/+126
\| \| \|
\| * \|	Allow subclasses of Product2 in all key-value related classes ↵	Reynold Xin	2013-08-18	10	-107/+56
\| \| \| \| \| \| \| \| \| \| \| \|	(ShuffleDependency, PairRDDFunctions, etc).
* \| \|	Merge pull request #840 from AndreSchumacher/zipegg	Matei Zaharia	2013-08-18	1	-1/+8
\|\ \ \ \| \|/ / \|/\| \|	Implementing SPARK-878 for PySpark: adding zip and egg files to context ...
\| * \|	Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵	Andre Schumacher	2013-08-16	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	passing it down to workers which add these to their sys.path
* \| \|	Moved shuffle serializer setting from a constructor parameter to a ↵	Reynold Xin	2013-08-17	5	-32/+51
\| \| \| \| \| \| \| \| \| \| \| \|	setSerializer method in various RDDs that involve shuffle operations.
* \| \|	Removed the mapSideCombine option in partitionBy.	Reynold Xin	2013-08-17	2	-28/+6
\| \| \|
* \| \|	Removed the mapSideCombine option in CoGroupedRDD.	Reynold Xin	2013-08-17	1	-33/+5
\| \| \|
* \| \|	Removed the unused shuffleId in ShuffleDependency's constructor.	Reynold Xin	2013-08-16	1	-1/+0
\| \| \|
* \| \|	Merge pull request #839 from jegonzal/zip_partitions	Matei Zaharia	2013-08-16	4	-17/+14
\|\ \ \ \| \| \| \| \| \| \| \|	Currying RDD.zipPartitions
\| * \| \|	Reversing the argument order in zipPartitions to enable stronger type inference.	Joseph E. Gonzalez	2013-08-16	4	-17/+14
\| \| \|/ \| \|/\|
* \| \|	Use the JSON formatter from Scala library and removed dependency on lift-json.	Reynold Xin	2013-08-15	6	-70/+64
\| \| \| \| \| \| \| \| \| \| \| \|	It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
* \| \|	Revert "Merge pull request #834 from Daemoen/master"	Reynold Xin	2013-08-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 230ab2722ebd399afcf64c1a131f4929f602177d, reversing changes made to 659553b21ddd7504889ce113a816c1db4a73f167.
* \| \|	Merge pull request #834 from Daemoen/master	Reynold Xin	2013-08-15	1	-1/+2
\|\ \ \ \| \|_\|/ \|/\| \|	Updated json output to allow for display of worker state
\| * \|	Updated json output to allow for display of worker state	Daemoen	2013-08-15	1	-1/+2
\| \| \| \| \| \| \| \| \|	Ops teams need to ensure that the cluster is functional and performant. Having to scrape the html source for worker state won't work reliably, and will be slow. By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.
* \| \|	Merge pull request #836 from pwendell/rename	Patrick Wendell	2013-08-15	19	-64/+64
\|\ \ \ \| \|_\|/ \|/\| \|	Rename `memoryBytesToString` and `memoryMegabytesToString`
\| * \|	Rename `memoryBytesToString` and `memoryMegabytesToString`	Patrick Wendell	2013-08-15	19	-64/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are used all over the place now and they are not specific to memory at all. memoryBytesToString --> bytesToString memoryMegabytesToString --> megabytesToString
* \| \|	More minor UI changes including code review feedback.	Reynold Xin	2013-08-15	6	-16/+39
\| \| \|