| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
spark.cleaner.delay is insufficient.
|
| |
|
|
|
|
| |
IDs in CacheTracker.
|
| |
|
| |
|
|
|
|
| |
cached object while deserializing.
|
| |
|
|
|
|
| |
modules use CleanupTask to periodically clean up metadata.
|
|\ |
|
| |\
| | |
| | | |
SPARK-624: make the default local IP customizable
|
| | | |
|
| | | |
|
| |/
| |
| |
| |
| |
| | |
Among other things, should prevent OutOfMemoryErrors in some daemon threads
(such as the network manager) from causing a spark executor to enter a state
where it cannot make progress but does not report an error.
|
| |
| |
| |
| | |
matching with data locality hints from storage systems.
|
| |
| |
| |
| | |
SPARK-617 #resolve
|
| | |
|
| |
| |
| |
| | |
each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.
|
| |
| |
| |
| |
| |
| | |
1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks.
2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads.
3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
- Don't report a job as finishing multiple times
- Don't show state of workers as LOADING when they're running
- Show start and finish times in web UI
- Sort web UI tables by ID and time by default
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
master due to
a locally computed operation
Conflicts:
core/src/main/scala/spark/storage/BlockManagerMaster.scala
|
| |
| |
| |
| | |
reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.
|
| |
| |
| |
| | |
splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.
|
| |
| |
| |
| | |
updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.
|
| |
| |
| |
| | |
code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.
|
| | |
|
| | |
|
| |
| |
| |
| | |
streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.
|
| |
| |
| |
| | |
changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.
|
| |
| |
| |
| | |
checkpointing in them.
|
| |
| |
| |
| | |
checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
core/src/main/scala/spark/KryoSerializer.scala
core/src/main/scala/spark/MapOutputTracker.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/executor/Executor.scala
core/src/main/scala/spark/network/Connection.scala
core/src/main/scala/spark/network/ConnectionManagerTest.scala
core/src/main/scala/spark/rdd/BlockRDD.scala
core/src/main/scala/spark/rdd/NewHadoopRDD.scala
core/src/main/scala/spark/scheduler/ShuffleMapTask.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManager.scala
core/src/main/scala/spark/storage/BlockMessage.scala
core/src/main/scala/spark/storage/BlockStore.scala
core/src/main/scala/spark/storage/StorageLevel.scala
core/src/main/scala/spark/util/AkkaUtils.scala
project/SparkBuild.scala
run
|
| | |
|
| |\
| | |
| | | |
Added a method to report slave memory status; force serialize accumulator update in local mode.
|
| | | |
|
| | | |
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Conflicts:
core/src/main/scala/spark/Dependency.scala
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/ShuffledRDD.scala
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This separation of concerns simplifies the
ShuffleDependency and ShuffledRDD interfaces.
Map-side combining can be performed in a
mapPartitions() call prior to shuffling the RDD.
I don't anticipate this having much of a
performance impact: in both approaches, each tuple
is hashed twice: once in the bucket partitioning
and once in the combiner's hashtable. The same
steps are being performed, but in a different
order and through one extra Iterator.
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead, the presence or absense of a ShuffleDependency's aggregator
will control whether map-side combining is performed.
|
| | | | |
|
| | | | |
|
| | |/
| |/| |
|
| | | |
|
| |/ |
|