spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Refactored RDD checkpointing to minimize extra fields in RDD class.	Tathagata Das	2012-12-04	11	-191/+140
\|
*	Added metadata cleaner to HttpBroadcast to clean up old broacast files.	Tathagata Das	2012-12-03	1	-0/+24
\|
*	Made RDD checkpoint not create a new thread. Fixed bug in detecting when ↵	Tathagata Das	2012-12-02	2	-22/+12
\| \| \| \|	spark.cleaner.delay is insufficient.
*	Minor modifications.	Tathagata Das	2012-12-01	1	-1/+6
\|
*	Added TimeStampedHashSet and used that to cleanup the list of registered RDD ↵	Tathagata Das	2012-11-29	3	-9/+81
\| \| \| \|	IDs in CacheTracker.
*	Added metadata cleaner to BlockManager to remove old blocks completely.	Tathagata Das	2012-11-28	2	-12/+36
\|
*	Renamed CleanupTask to MetadataCleaner.	Tathagata Das	2012-11-28	5	-14/+15
\|
*	Modified StorageLevel and BlockManagerId to cache common objects and use ↵	Tathagata Das	2012-11-28	4	-29/+101
\| \| \| \|	cached object while deserializing.
*	Bug fixes	Tathagata Das	2012-11-28	2	-9/+19
\|
*	Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various ↵	Tathagata Das	2012-11-27	6	-14/+156
\| \| \| \|	modules use CleanupTask to periodically clean up metadata.
*	Merged branch mesos/master to branch dev.	Tathagata Das	2012-11-26	27	-76/+313
\|\
\| *	Merge pull request #304 from mbautin/configurable_local_ip	Matei Zaharia	2012-11-19	1	-1/+7
\| \|\ \| \| \| \| \| \|	SPARK-624: make the default local IP customizable
\| \| *	Addressing Matei's comment: SPARK_LOCAL_IP environment variable	mbautin	2012-11-19	1	-1/+1
\| \| \|
\| \| *	SPARK-624: make the default local IP customizable	mbautin	2012-11-15	1	-1/+7
\| \| \|
\| * \|	Set default uncaught exception handler to exit.	Charles Reiss	2012-11-16	2	-1/+15
\| \|/ \| \| \| \| \| \| \| \| \| \|	Among other things, should prevent OutOfMemoryErrors in some daemon threads (such as the network manager) from causing a spark executor to enter a state where it cannot make progress but does not report an error.
\| *	Use DNS names instead of IP addresses in standalone mode, to allow	Matei Zaharia	2012-11-15	2	-4/+4
\| \| \| \| \| \| \| \|	matching with data locality hints from storage systems.
\| *	Detect correctly when one has disconnected from a standalone cluster.	Matei Zaharia	2012-11-11	1	-1/+13
\| \| \| \| \| \| \| \|	SPARK-617 #resolve
\| *	Fix K-means example a little	root	2012-11-10	1	-1/+2
\| \|
\| *	Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads ↵	Tathagata Das	2012-11-09	2	-4/+18
\| \| \| \| \| \| \| \|	each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.
\| *	Fixed deadlock in BlockManager.	Tathagata Das	2012-11-09	3	-87/+180
\| \| \| \| \| \| \| \| \| \| \| \|	1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks. 2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads. 3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
\| *	Added an option to spread out jobs in the standalone mode.	Matei Zaharia	2012-11-08	3	-18/+56
\| \|
\| *	Fix for connections not being reused (from Josh Rosen)	Matei Zaharia	2012-11-08	1	-1/+2
\| \|
\| *	fix bug in getting slave id out of mesos	Imran Rashid	2012-11-08	1	-1/+1
\| \|
\| *	Various fixes to standalone mode and web UI:	Matei Zaharia	2012-11-07	13	-45/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Don't report a job as finishing multiple times - Don't show state of workers as LOADING when they're running - Show start and finish times in web UI - Sort web UI tables by ID and time by default
\| *	Made Akka timeout and message frame size configurable, and upped the defaults	Matei Zaharia	2012-11-06	1	-2/+5
\| \|
\| *	Remove unnecessary hash-map put in MemoryStore	Shivaram Venkataraman	2012-11-01	1	-3/+0
\| \|
\| *	Don't throw an error in the block manager when a block is cached on the ↵	root	2012-10-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master due to a locally computed operation Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala
* \|	Fixed bug in the number of splits in RDD after checkpointing. Modified ↵	Tathagata Das	2012-11-19	1	-1/+2
\| \| \| \| \| \| \| \|	reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
* \|	Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD ↵	Tathagata Das	2012-11-17	2	-4/+42
\| \| \| \| \| \| \| \|	splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.
* \|	Optimized checkpoint writing by reusing FileSystem object. Fixed bug in ↵	Tathagata Das	2012-11-13	1	-5/+1
\| \| \| \| \| \| \| \|	updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
* \|	Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the ↵	Tathagata Das	2012-11-11	3	-198/+127
\| \| \| \| \| \| \| \|	code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.
* \|	Fixed deadlock in BlockManager.	Tathagata Das	2012-11-09	2	-89/+101
\| \|
* \|	Fixed major bugs in checkpointing.	Tathagata Das	2012-11-05	1	-2/+4
\| \|
* \|	Made checkpointing of dstream graph to work with checkpointing of RDDs. For ↵	Tathagata Das	2012-11-04	2	-15/+30
\| \| \| \| \| \| \| \|	streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
* \|	Added 'synchronized' to RDD serialization to ensure checkpoint-related ↵	Tathagata Das	2012-10-31	3	-4/+92
\| \| \| \| \| \| \| \|	changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.
* \|	Added checkpointing support to all RDDs, along with CheckpointSuite to test ↵	Tathagata Das	2012-10-30	22	-107/+352
\| \| \| \| \| \| \| \|	checkpointing in them.
* \|	Modified RDD API to make dependencies a var (therefore can be changed to ↵	Tathagata Das	2012-10-29	19	-107/+149
\| \| \| \| \| \| \| \|	checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).
* \|	Merge remote-tracking branch 'public/master' into dev	Matei Zaharia	2012-10-24	193	-2097/+5043
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala core/src/main/scala/spark/KryoSerializer.scala core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/executor/Executor.scala core/src/main/scala/spark/network/Connection.scala core/src/main/scala/spark/network/ConnectionManagerTest.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/NewHadoopRDD.scala core/src/main/scala/spark/scheduler/ShuffleMapTask.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockMessage.scala core/src/main/scala/spark/storage/BlockStore.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/AkkaUtils.scala project/SparkBuild.scala run
\| *	Strip leading mesos:// in URLs passed to Mesos	Matei Zaharia	2012-10-24	1	-2/+3
\| \|
\| *	Merge pull request #281 from rxin/memreport	Matei Zaharia	2012-10-23	3	-71/+93
\| \|\ \| \| \| \| \| \|	Added a method to report slave memory status; force serialize accumulator update in local mode.
\| \| *	Serialize accumulator updates in TaskResult for local mode.	Reynold Xin	2012-10-15	1	-4/+5
\| \| \|
\| \| *	Added a method to report slave memory status.	Reynold Xin	2012-10-14	2	-67/+88
\| \| \|
\| * \|	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev	Matei Zaharia	2012-10-23	13	-250/+113
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala
\| \| * \|	Remove map-side combining from ShuffleMapTask.	Josh Rosen	2012-10-13	8	-94/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.
\| \| * \|	Remove mapSideCombine field from Aggregator.	Josh Rosen	2012-10-13	5	-22/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.
\| \| * \|	Change ShuffleFetcher to return an Iterator.	Josh Rosen	2012-10-13	8	-167/+63
\| \| \| \|
\| \| * \|	Add helper methods to Aggregator.	Josh Rosen	2012-10-13	1	-1/+32
\| \| \| \|
\| * \| \|	Support for Hadoop 2 distributions such as cdh4	Thomas Dudziak	2012-10-18	7	-20/+45
\| \| \|/ \| \|/\|
\| * \|	Made ShuffleDependency automatically find a shuffle ID for itself	Matei Zaharia	2012-10-14	3	-5/+6
\| \| \|
\| * \|	Take executor environment vars as an arguemnt to SparkContext	Matei Zaharia	2012-10-13	7	-79/+107
\| \|/