Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Refactored RDD checkpointing to minimize extra fields in RDD class. | Tathagata Das | 2012-12-04 | 1 | -70/+3 |
| | |||||
* | Modified StorageLevel and BlockManagerId to cache common objects and use ↵ | Tathagata Das | 2012-11-28 | 1 | -0/+26 |
| | | | | cached object while deserializing. | ||||
* | Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD ↵ | Tathagata Das | 2012-11-17 | 1 | -0/+28 |
| | | | | splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits. | ||||
* | Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the ↵ | Tathagata Das | 2012-11-11 | 1 | -6/+24 |
| | | | | code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations. | ||||
* | Added 'synchronized' to RDD serialization to ensure checkpoint-related ↵ | Tathagata Das | 2012-10-31 | 1 | -1/+70 |
| | | | | changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job. | ||||
* | Added checkpointing support to all RDDs, along with CheckpointSuite to test ↵ | Tathagata Das | 2012-10-30 | 2 | -6/+135 |
| | | | | checkpointing in them. | ||||
* | Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev | Matei Zaharia | 2012-10-23 | 1 | -37/+2 |
|\ | | | | | | | | | | | | | Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala | ||||
| * | Remove map-side combining from ShuffleMapTask. | Josh Rosen | 2012-10-13 | 1 | -29/+0 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator. | ||||
| * | Remove mapSideCombine field from Aggregator. | Josh Rosen | 2012-10-13 | 1 | -15/+5 |
| | | | | | | | | | | Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed. | ||||
| * | Change ShuffleFetcher to return an Iterator. | Josh Rosen | 2012-10-13 | 1 | -10/+14 |
| | | |||||
* | | Take executor environment vars as an arguemnt to SparkContext | Matei Zaharia | 2012-10-13 | 1 | -0/+6 |
|/ | |||||
* | Added a test for when an RDD only partially fits in memory | Matei Zaharia | 2012-10-12 | 1 | -2/+18 |
| | |||||
* | Add test to verify if RDD is computed even if block manager has insufficient | Shivaram Venkataraman | 2012-10-12 | 1 | -0/+10 |
| | | | | memory | ||||
* | Change block manager to accept a ArrayBuffer instead of an iterator to ensure | Shivaram Venkataraman | 2012-10-11 | 2 | -9/+9 |
| | | | | | that the computation can proceed even if we run out of memory to cache the block. Update CacheTracker to use this new interface | ||||
* | Made compression configurable separately for shuffle, broadcast and RDDs | Matei Zaharia | 2012-10-07 | 1 | -16/+80 |
| | |||||
* | Fixed a bug in addFile that if the file is specified as "file:///", the | Reynold Xin | 2012-10-07 | 1 | -9/+22 |
| | | | | symlink is created wrong for local mode. | ||||
* | Removed the need to sleep in tests due to waiting for Akka to shut down | Matei Zaharia | 2012-10-07 | 14 | -19/+38 |
| | |||||
* | Modified shuffle to limit the maximum outstanding data size in bytes, | Matei Zaharia | 2012-10-06 | 2 | -4/+19 |
| | | | | | | instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs. | ||||
* | Pass sizes of map outputs back to MapOutputTracker | Matei Zaharia | 2012-10-06 | 1 | -0/+23 |
| | |||||
* | Minor formatting fixes | Matei Zaharia | 2012-10-05 | 1 | -1/+1 |
| | |||||
* | Factor subclasses of RDD out of RDD.scala into their own classes | Andy Konwinski | 2012-10-05 | 1 | -2/+6 |
| | | | | in the rdd package. | ||||
* | Moves all files in core/src/main/scala/ that have RDD in them from | Andy Konwinski | 2012-10-05 | 2 | -2/+5 |
| | | | | package spark to package spark.rdd and updates all references to them. | ||||
* | Fix SizeEstimator tests to work with String classes in JDK 6 and 7 | Shivaram Venkataraman | 2012-10-05 | 2 | -12/+29 |
| | | | | | | Conflicts: core/src/test/scala/spark/BoundedMemoryCacheSuite.scala | ||||
* | change tests to show utility of localValue | Imran Rashid | 2012-10-04 | 1 | -3/+4 |
| | |||||
* | make accumulator.localValue public, add tests | Imran Rashid | 2012-10-04 | 1 | -0/+15 |
| | | | | | Conflicts: core/src/test/scala/spark/AccumulatorSuite.scala | ||||
* | Merge branch 'dev' of github.com:mesos/spark into dev | Matei Zaharia | 2012-10-02 | 1 | -3/+19 |
|\ | |||||
| * | Merge branch 'dev' of https://github.com/mesos/spark into dev | Reynold Xin | 2012-10-02 | 1 | -3/+3 |
| |\ | |||||
| * | | Allow whitespaces in cluster URL configuration for local cluster. | Reynold Xin | 2012-10-02 | 1 | -3/+19 |
| | | | |||||
* | | | Added a test for overly large blocks in memory store | Matei Zaharia | 2012-10-02 | 1 | -0/+9 |
| | | | |||||
* | | | Fixed cache replacement behavior of BlockManager: | Matei Zaharia | 2012-10-02 | 1 | -6/+83 |
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | - Partitions that get dropped to disk will now be loaded back into RAM after they're accessed again - Same-RDD rule for cache replacement is now implemented (don't drop partitions from an RDD to make room for other partitions from itself) - Items stored as MEMORY_AND_DISK go into memory only first, instead of being eagerly written out to disk - MemoryStore.ensureFreeSpace is called within a lock on the writer thread to prevent race conditions (this can still be optimized to allow multiple concurrent calls to it but it's a start) - MemoryStore does not accept blocks larger than its limit | ||||
* | | Revert "Place Spray repo ahead of Cloudera in Maven search path" | Matei Zaharia | 2012-10-02 | 1 | -3/+3 |
|/ | | | | This reverts commit 42e0a68082327c78dbd0fd313145124d9b8a9d98. | ||||
* | Place Spray repo ahead of Cloudera in Maven search path | Matei Zaharia | 2012-10-02 | 1 | -3/+3 |
| | |||||
* | Remove some printlns in tests | Matei Zaharia | 2012-10-01 | 2 | -2/+5 |
| | |||||
* | Added a (failing) test for LRU with MEMORY_AND_DISK. | Matei Zaharia | 2012-09-30 | 1 | -3/+7 |
| | |||||
* | Fixed several bugs that caused weird behavior with files in spark-shell: | Matei Zaharia | 2012-09-30 | 2 | -1/+23 |
| | | | | | | | | | - SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded. | ||||
* | Comment | Matei Zaharia | 2012-09-29 | 1 | -1/+1 |
| | |||||
* | Added a CoalescedRDD class for reducing the number of partitions in an RDD. | Matei Zaharia | 2012-09-29 | 1 | -0/+31 |
| | |||||
* | Comment | Matei Zaharia | 2012-09-29 | 1 | -0/+1 |
| | |||||
* | Made BlockManager unmap memory-mapped files when necessary to reduce the | Matei Zaharia | 2012-09-29 | 1 | -2/+58 |
| | | | | number of open files. Also optimized sending of disk-based blocks. | ||||
* | Added an option to compress blocks in the block store | Matei Zaharia | 2012-09-27 | 1 | -0/+16 |
| | |||||
* | Renamed storage levels to something cleaner; fixes #223. | Matei Zaharia | 2012-09-27 | 1 | -27/+27 |
| | |||||
* | Merge pull request #222 from rxin/dev | Matei Zaharia | 2012-09-26 | 1 | -0/+5 |
|\ | | | | | Added MapPartitionsWithSplitRDD. | ||||
| * | Added MapPartitionsWithSplitRDD. | Reynold Xin | 2012-09-26 | 1 | -0/+5 |
| | | |||||
* | | Allow controlling number of splits in sortByKey. | Matei Zaharia | 2012-09-26 | 1 | -5/+43 |
|/ | |||||
* | Fixed a test that was getting extremely lucky before, and increased the | Matei Zaharia | 2012-09-26 | 1 | -9/+9 |
| | | | | number of samples used for sorting | ||||
* | Fix some test issues | Matei Zaharia | 2012-09-24 | 2 | -12/+14 |
| | |||||
* | Separated ShuffledRDD into multiple classes: RepartitionShuffledRDD, | Reynold Xin | 2012-09-19 | 1 | -9/+9 |
| | | | | ShuffledSortedRDD, and ShuffledAggregatedRDD. | ||||
* | Merge branch 'dev' into feature/fileserver | Denny | 2012-09-11 | 1 | -2/+25 |
|\ | | | | | | | | | Conflicts: core/src/main/scala/spark/SparkContext.scala | ||||
| * | Manually merge pull request #175 by Imran Rashid | Matei Zaharia | 2012-09-11 | 1 | -2/+25 |
| | | |||||
| * | Added a unit test for local-cluster mode and simplified some of the code ↵ | Matei Zaharia | 2012-09-07 | 1 | -0/+68 |
| | | | | | | | | involved in that |