spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	fall back to filter-map-collect when calling lookup() on an RDD without a ↵	Mark Hamstra	2012-12-24	1	-0/+11
\| \| \| \|	partitioner
*	Allow distinct() to be called without parentheses when using the default ↵	Mark Hamstra	2012-12-24	1	-4/+8
\| \| \| \|	number of splits.
*	Let the slave notify the master block removal.	Reynold Xin	2012-12-20	1	-23/+36
\|
*	Fixed conflicts from merging Charles' and TD's block manager changes.	Reynold Xin	2012-12-14	1	-27/+4
\|
*	Merge branch 'master' of github.com:mesos/spark into spark-633	Reynold Xin	2012-12-13	1	-7/+9
\|\
\| *	Fixed the broken Java unit test from SPARK-635.	Reynold Xin	2012-12-13	1	-7/+9
\| \|
* \|	Merged TD's block manager refactoring.	Reynold Xin	2012-12-13	1	-25/+66
\| \|
* \|	Added the ability in block manager to remove blocks.	Reynold Xin	2012-12-13	1	-10/+49
\|/
*	Use Akka scheduler for BlockManager heart beats.	Charles Reiss	2012-12-10	1	-25/+25
\| \| \| \|	Adds required ActorSystem argument to BlockManager constructors.
*	Tests for block manager heartbeats.	Charles Reiss	2012-12-05	1	-0/+68
\|
*	Added zip to Java API	Matei Zaharia	2012-11-27	1	-0/+15
\|
*	Added a zip() operation for RDDs with the same shape (number of	Matei Zaharia	2012-11-27	1	-0/+12
\| \| \| \|	partitions and number of elements in each partition)
*	Merge pull request #311 from woggling/map-output-npe	Matei Zaharia	2012-11-27	1	-0/+51
\|\ \| \| \| \|	Fix NullPointerException when map output unregistered from MapOutputTracker twice
\| *	Tests for MapOutputTracker.	Charles Reiss	2012-11-27	1	-0/+51
\| \|
* \|	For size compression, compress non zero values into non zero values.	Reynold Xin	2012-11-27	1	-2/+2
\|/
*	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev	Matei Zaharia	2012-10-23	1	-37/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala
\| *	Remove map-side combining from ShuffleMapTask.	Josh Rosen	2012-10-13	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.
\| *	Remove mapSideCombine field from Aggregator.	Josh Rosen	2012-10-13	1	-15/+5
\| \| \| \| \| \| \| \| \| \|	Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.
\| *	Change ShuffleFetcher to return an Iterator.	Josh Rosen	2012-10-13	1	-10/+14
\| \|
* \|	Take executor environment vars as an arguemnt to SparkContext	Matei Zaharia	2012-10-13	1	-0/+6
\|/
*	Added a test for when an RDD only partially fits in memory	Matei Zaharia	2012-10-12	1	-2/+18
\|
*	Add test to verify if RDD is computed even if block manager has insufficient	Shivaram Venkataraman	2012-10-12	1	-0/+10
\| \| \| \|	memory
*	Change block manager to accept a ArrayBuffer instead of an iterator to ensure	Shivaram Venkataraman	2012-10-11	2	-9/+9
\| \| \| \| \|	that the computation can proceed even if we run out of memory to cache the block. Update CacheTracker to use this new interface
*	Made compression configurable separately for shuffle, broadcast and RDDs	Matei Zaharia	2012-10-07	1	-16/+80
\|
*	Fixed a bug in addFile that if the file is specified as "file:///", the	Reynold Xin	2012-10-07	1	-9/+22
\| \| \| \|	symlink is created wrong for local mode.
*	Removed the need to sleep in tests due to waiting for Akka to shut down	Matei Zaharia	2012-10-07	14	-19/+38
\|
*	Modified shuffle to limit the maximum outstanding data size in bytes,	Matei Zaharia	2012-10-06	2	-4/+19
\| \| \| \| \| \|	instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.
*	Pass sizes of map outputs back to MapOutputTracker	Matei Zaharia	2012-10-06	1	-0/+23
\|
*	Minor formatting fixes	Matei Zaharia	2012-10-05	1	-1/+1
\|
*	Factor subclasses of RDD out of RDD.scala into their own classes	Andy Konwinski	2012-10-05	1	-2/+6
\| \| \| \|	in the rdd package.
*	Moves all files in core/src/main/scala/ that have RDD in them from	Andy Konwinski	2012-10-05	2	-2/+5
\| \| \| \|	package spark to package spark.rdd and updates all references to them.
*	Fix SizeEstimator tests to work with String classes in JDK 6 and 7	Shivaram Venkataraman	2012-10-05	2	-12/+29
\| \| \| \| \| \|	Conflicts: core/src/test/scala/spark/BoundedMemoryCacheSuite.scala
*	change tests to show utility of localValue	Imran Rashid	2012-10-04	1	-3/+4
\|
*	make accumulator.localValue public, add tests	Imran Rashid	2012-10-04	1	-0/+15
\| \| \| \| \|	Conflicts: core/src/test/scala/spark/AccumulatorSuite.scala
*	Merge branch 'dev' of github.com:mesos/spark into dev	Matei Zaharia	2012-10-02	1	-3/+19
\|\
\| *	Merge branch 'dev' of https://github.com/mesos/spark into dev	Reynold Xin	2012-10-02	1	-3/+3
\| \|\
\| * \|	Allow whitespaces in cluster URL configuration for local cluster.	Reynold Xin	2012-10-02	1	-3/+19
\| \| \|
* \| \|	Added a test for overly large blocks in memory store	Matei Zaharia	2012-10-02	1	-0/+9
\| \| \|
* \| \|	Fixed cache replacement behavior of BlockManager:	Matei Zaharia	2012-10-02	1	-6/+83
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Partitions that get dropped to disk will now be loaded back into RAM after they're accessed again - Same-RDD rule for cache replacement is now implemented (don't drop partitions from an RDD to make room for other partitions from itself) - Items stored as MEMORY_AND_DISK go into memory only first, instead of being eagerly written out to disk - MemoryStore.ensureFreeSpace is called within a lock on the writer thread to prevent race conditions (this can still be optimized to allow multiple concurrent calls to it but it's a start) - MemoryStore does not accept blocks larger than its limit
* \|	Revert "Place Spray repo ahead of Cloudera in Maven search path"	Matei Zaharia	2012-10-02	1	-3/+3
\|/ \| \| \|	This reverts commit 42e0a68082327c78dbd0fd313145124d9b8a9d98.
*	Place Spray repo ahead of Cloudera in Maven search path	Matei Zaharia	2012-10-02	1	-3/+3
\|
*	Write all unit test output to a file	Matei Zaharia	2012-10-01	1	-4/+6
\|
*	Remove some printlns in tests	Matei Zaharia	2012-10-01	2	-2/+5
\|
*	Added a (failing) test for LRU with MEMORY_AND_DISK.	Matei Zaharia	2012-09-30	1	-3/+7
\|
*	Fixed several bugs that caused weird behavior with files in spark-shell:	Matei Zaharia	2012-09-30	2	-1/+23
\| \| \| \| \| \| \| \| \|	- SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded.
*	Comment	Matei Zaharia	2012-09-29	1	-1/+1
\|
*	Added a CoalescedRDD class for reducing the number of partitions in an RDD.	Matei Zaharia	2012-09-29	1	-0/+31
\|
*	Comment	Matei Zaharia	2012-09-29	1	-0/+1
\|
*	Made BlockManager unmap memory-mapped files when necessary to reduce the	Matei Zaharia	2012-09-29	1	-2/+58
\| \| \| \|	number of open files. Also optimized sending of disk-based blocks.
*	Added an option to compress blocks in the block store	Matei Zaharia	2012-09-27	1	-0/+16
\|