spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add end-to-end test for standalone scheduler fault tolerance	Aaron Davidson	2013-10-05	2	-1/+414
\| \| \| \|	Docker files drawn mostly from Matt Masse. Some updates from Andre Schumacher.
*	Address Matei's comments	Aaron Davidson	2013-10-05	10	-40/+41
\|
*	Fix race conditions during recovery	Aaron Davidson	2013-10-04	8	-52/+122
\| \| \| \| \| \| \| \| \| \|	One major change was the use of messages instead of raw functions as the parameter of Akka scheduled timers. Since messages are serialized, unlike raw functions, the behavior is easier to think about and doesn't cause race conditions when exceptions are thrown. Another change is to avoid using global pointers that might change without a lock.
*	Add license notices	Aaron Davidson	2013-09-26	6	-3/+86
\|
*	Standalone Scheduler fault tolerance using ZooKeeper	Aaron Davidson	2013-09-26	23	-170/+708
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.
*	Standalone Scheduler fault recovery	Aaron Davidson	2013-09-26	15	-75/+458
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implements a basic form of Standalone Scheduler fault recovery. In particular, this allows faults to be manually recovered from by means of restarting the Master process on the same machine. This is the majority of the code necessary for general fault tolerance, which will first elect a leader and then recover the Master state. In order to enable fault recovery, the Master will persist a small amount of state related to the registration of Workers and Applications to disk. If the Master is started and sees that this state is still around, it will enter Recovery mode, during which time it will not schedule any new Executors on Workers (but it does accept the registration of new Clients and Workers). At this point, the Master attempts to reconnect to all Workers and Client applications that were registered at the time of failure. After confirming either the existence or nonexistence of all such nodes (within a certain timeout), the Master will exit Recovery mode and resume normal scheduling.
*	Merge pull request #14 from kayousterhout/untangle_scheduler	Reynold Xin	2013-09-26	34	-71/+62
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improved organization of scheduling packages. This commit does not change any code -- only file organization. Please let me know if there was some masterminded strategy behind the existing organization that I failed to understand! There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it seems to be within the cluster package. The one thing about the scheduling code that seems a little funny to me is the naming of the SchedulerBackends. The StandaloneSchedulerBackend is not just for Standalone mode, but instead is used by Mesos coarse grained mode and Yarn, and the backend that is just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there was a reason for this naming that I'm just not aware of.
\| *	Improved organization of scheduling packages.	Kay Ousterhout	2013-09-25	34	-71/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit does not change any code -- only file organization. There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it is within the cluster package.
* \|	Merge pull request #930 from holdenk/master	Reynold Xin	2013-09-26	1	-0/+10
\|\ \ \| \| \| \| \| \|	Add mapPartitionsWithIndex
\| * \|	Fix formatting :)	Holden Karau	2013-09-23	1	-4/+5
\| \| \|
\| * \|	Switch indent from 2 to 4 spaces	Holden Karau	2013-09-22	1	-2/+2
\| \| \|
\| * \|	Make mapPartitionsWithIndex work with JavaRDD's	Holden Karau	2013-09-14	1	-2/+3
\| \| \|
\| * \|	Start of working on SPARK-615	Holden Karau	2013-09-11	1	-0/+8
\| \| \|
* \| \|	Merge pull request #7 from wannabeast/memorystore-fixes	Reynold Xin	2013-09-26	1	-6/+8
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	some minor fixes to MemoryStore This is a repeat of #5, moved to its own branch in my repo. This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.
\| * \| \|	Synchronize on "entries" the remaining update to "currentMemory".	Mike	2013-09-19	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make "currentMemory" @volatile, so that it's reads in ensureFreeSpace() are atomic and up-to-date--i.e., currentMemory can't increase while putLock is held (though it could decrease, which would only help ensureFreeSpace()).
\| * \| \|	Set currentMemory to 0 in clear().	Mike	2013-09-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unnecessary entries.get() call.
\| * \| \|	Remove MemoryStore$Entry.dropPending, unused as of 42e0a68082.	Mike	2013-09-10	1	-1/+1
\| \| \| \|
* \| \| \|	Merge pull request #9 from rxin/limit	Patrick Wendell	2013-09-26	2	-10/+66
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Smarter take/limit implementation.
\| * \| \| \|	Smarter take/limit implementation.	Reynold Xin	2013-09-20	2	-10/+66
\| \| \|/ / \| \|/\| \|
* \| \| \|	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\| \|_\|/ \|/\| \|
* \| \|	Merge branch 'master' of github.com:markhamstra/incubator-spark	Reynold Xin	2013-09-23	1	-1/+0
\|\ \ \
\| * \| \|	Removed repetative import; fixes hidden definition compiler warning.	Mark Hamstra	2013-09-03	1	-1/+0
\| \| \|/ \| \|/\|
* \| \|	Change Exception to NoSuchElementException and minor style fix	jerryshao	2013-09-22	1	-6/+7
\| \| \|
* \| \|	Remove infix style and others	jerryshao	2013-09-22	1	-10/+8
\| \| \|
* \| \|	Refactor FairSchedulableBuilder:	jerryshao	2013-09-22	1	-39/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Configuration can be read from classpath if not set explicitly. 2. Add missing close handler.
* \| \|	Merge pull request #937 from jerryshao/localProperties-fix	Reynold Xin	2013-09-21	2	-2/+50
\|\ \ \ \| \| \| \| \| \| \| \|	Fix PR926 local properties issues in Spark Streaming like scenarios
\| * \| \|	Add barrier for local properties unit test and fix some styles	jerryshao	2013-09-22	2	-3/+11
\| \| \| \|
\| * \| \|	Fix issue when local properties pass from parent to child thread	jerryshao	2013-09-18	2	-2/+42
\| \| \|/ \| \|/\|
* / \|	After unit tests, clear port properties unconditionally	Ankur Dave	2013-09-19	2	-9/+7
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In MapOutputTrackerSuite, the "remote fetch" test sets spark.driver.port and spark.hostPort, assuming that they will be cleared by LocalSparkContext. However, the test never sets sc, so it remains null, causing LocalSparkContext to skip clearing these properties. Subsequent tests therefore fail with java.net.BindException: "Address already in use". This commit makes LocalSparkContext clear the properties even if sc is null.
* \|	Changed localProperties to use ThreadLocal (not DynamicVariable).	Kay Ousterhout	2013-09-11	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	The fact that DynamicVariable uses an InheritableThreadLocal can cause problems where the properties end up being shared across threads in certain circumstances.
* \|	Merge pull request #919 from mateiz/jets3t	Patrick Wendell	2013-09-11	1	-0/+5
\|\ \ \| \| \| \| \| \|	Add explicit jets3t dependency, which is excluded in hadoop-client
\| * \|	Add explicit jets3t dependency, which is excluded in hadoop-client	Matei Zaharia	2013-09-10	1	-0/+5
\| \| \|
* \| \|	Merge pull request #922 from pwendell/port-change	Patrick Wendell	2013-09-11	2	-2/+2
\|\ \ \ \| \| \| \| \| \| \| \|	Change default port number from 3030 to 4030.
\| * \| \|	Change port from 3030 to 4040	Patrick Wendell	2013-09-11	2	-2/+2
\| \|/ /
* / /	SPARK-894 - Not all WebUI fields delivered VIA JSON	David McCauley	2013-09-11	1	-1/+3
\|/ /
* \|	Merge pull request #915 from ooyala/master	Matei Zaharia	2013-09-09	1	-1/+9
\|\ \ \| \| \| \| \| \|	Get rid of / improve ugly NPE when Utils.deleteRecursively() fails
\| * \|	Style fix: put body of if within curly braces	Evan Chan	2013-09-09	1	-1/+3
\| \| \|
\| * \|	Print out more friendly error if listFiles() fails	Evan Chan	2013-09-09	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	listFiles() could return null if the I/O fails, and this currently results in an ugly NPE which is hard to diagnose.
* \| \|	Merge pull request #907 from stephenh/document_coalesce_shuffle	Matei Zaharia	2013-09-09	2	-4/+27
\|\ \ \ \| \| \| \| \| \| \| \|	Add better docs for coalesce.
\| * \| \|	Use a set since shuffle could change order.	Stephen Haberman	2013-09-09	1	-1/+1
\| \| \| \|
\| * \| \|	Reword 'evenly distributed' to 'distributed with a hash partitioner.	Stephen Haberman	2013-09-09	1	-2/+2
\| \| \| \|
\| * \| \|	Add better docs for coalesce.	Stephen Haberman	2013-09-08	2	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)
* \| \| \|	Add metrics-ganglia to core pom file	Y.CORP.YAHOO.COM\tgraves	2013-09-09	1	-0/+4
\| \| \| \|
* \| \| \|	Merge pull request #890 from mridulm/master	Matei Zaharia	2013-09-08	3	-2/+17
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \|	Fix hash bug
\| * \| \| \|	Address review comments - rename toHash to nonNegativeHash	Mridul Muralidharan	2013-09-04	3	-3/+3
\| \| \| \| \|
\| * \| \| \|	Fix hash bug - caused failure after 35k stages, sigh	Mridul Muralidharan	2013-09-04	3	-2/+17
\| \| \| \| \|
* \| \| \| \|	Merge pull request #909 from mateiz/exec-id-fix	Reynold Xin	2013-09-08	2	-7/+7
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \|	Fix an instance where full standalone mode executor IDs were passed to
\| * \| \| \| \|	Fix an instance where full standalone mode executor IDs were passed to	Matei Zaharia	2013-09-08	2	-7/+7
\| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	StandaloneSchedulerBackend instead of the smaller IDs used within Spark (that lack the application name). This was reported by ClearStory in https://github.com/clearstorydata/spark/pull/9. Also fixed some messages that said slave instead of executor.
* \| \| \| \|	Merge pull request #905 from mateiz/docs2	Matei Zaharia	2013-09-08	8	-18/+19
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \|	Job scheduling and cluster mode docs
\| * \| \| \| \|	Fix unit test failure due to changed default	Matei Zaharia	2013-09-08	1	-1/+1
\| \| \| \| \| \|