aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #68 from mosharaf/masterMatei Zaharia2013-10-187-12/+328
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Faster and stable/reliable broadcast HttpBroadcast is noticeably slow, but the alternatives (TreeBroadcast or BitTorrentBroadcast) are notoriously unreliable. The main problem with them is they try to manage the memory for the pieces of a broadcast themselves. Right now, the BroadcastManager does not know which machines the tasks reading from a broadcast variable is running and when they have finished. Consequently, we try to guess and often guess wrong, which blows up the memory usage and kills/hangs jobs. This very simple implementation solves the problem by not trying to manage the intermediate pieces; instead, it offloads that duty to the BlockManager which is quite good at juggling blocks. Otherwise, it is very similar to the BitTorrentBroadcast implementation (without fancy optimizations). And it runs much faster than HttpBroadcast we have right now. I've been using this for another project for last couple of weeks, and just today did some benchmarking against the Http one. The following shows the improvements for increasing broadcast size for cold runs. Each line represent the number of receivers. ![fix-bc-first](https://f.cloud.github.com/assets/232966/1349342/ffa149e4-36e7-11e3-9fa6-c74555829356.png) After the first broadcast is over, i.e., after JVM is wormed up and for HttpBroadcast the server is already running (I think), the following are the improvements for warm runs. ![fix-bc-succ](https://f.cloud.github.com/assets/232966/1349352/5a948bae-36e8-11e3-98ce-34f19ebd33e0.jpg) The curves are not as nice as the cold runs, but the improvements are obvious, specially for larger broadcasts and more receivers. Depending on how it goes, we should deprecate and/or remove old TreeBroadcast and BitTorrentBroadcast implementations, and hopefully, SPARK-889 will not be necessary any more.
| * Should compile now.Mosharaf Chowdhury2013-10-171-1/+2
| |
| * Added an after block to reset spark.broadcast.factoryMosharaf Chowdhury2013-10-171-0/+4
| |
| * Merge remote-tracking branch 'upstream/master'Mosharaf Chowdhury2013-10-173-3/+39
| |\
| * | BroadcastSuite updated to test both HttpBroadcast and TorrentBroadcast in ↵Mosharaf Chowdhury2013-10-171-3/+44
| | | | | | | | | | | | local, local[N], local-cluster settings.
| * | Merge remote-tracking branch 'upstream/master'Mosharaf Chowdhury2013-10-179-105/+63
| |\ \
| * | | Code styling. Updated doc.Mosharaf Chowdhury2013-10-172-4/+12
| | | |
| * | | Removed unused code. Mosharaf Chowdhury2013-10-172-14/+11
| | | | | | | | | | | | Changes to match Spark coding style.
| * | | BroadcastTest2 --> BroadcastTestMosharaf Chowdhury2013-10-162-62/+12
| | | |
| * | | Fixes for the new BlockId naming convention.Mosharaf Chowdhury2013-10-162-7/+14
| | | |
| * | | Default blockSize is 4MB.Mosharaf Chowdhury2013-10-162-1/+60
| | | | | | | | | | | | | | | | BroadcastTest2 example added for testing broadcasts.
| * | | Removed unnecessary code, and added comment of memory-latency tradeoff.Mosharaf Chowdhury2013-10-161-4/+6
| | | |
| * | | Torrent-ish broadcast based on BlockManager.Mosharaf Chowdhury2013-10-163-4/+251
| | | |
* | | | Merge pull request #71 from aarondav/scdefaultsMatei Zaharia2013-10-182-8/+14
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | Spark shell exits if it cannot create SparkContext Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail. Simply exiting seems clearer.
| * | | Spark shell exits if it cannot create SparkContextAaron Davidson2013-10-172-8/+14
|/ / / | | | | | | | | | | | | | | | Mainly, this occurs if you provide a messed up MASTER url (one that doesn't match one of our regexes). Previously, we would default to Mesos, fail, and then start the shell anyway, except that any Spark command would fail.
* | | Merge pull request #69 from KarthikTunga/masterMatei Zaharia2013-10-173-3/+39
|\ \ \ | |_|/ |/| | | | | | | | | | | Fix for issue SPARK-627. Implementing --config argument in the scripts. This code fix is for issue SPARK-627. I added code to consider --config arguments in the scripts. In case the <conf-dir> is not a directory the scripts exit. I removed the --hosts argument. It can be achieved by giving a different config directory. Let me know if an explicit --hosts argument is required.
| * | SPARK-627 , Implementing --config arguments in the scriptsKarthikTunga2013-10-161-1/+1
| | |
| * | SPARK-627 , Implementing --config arguments in the scriptsKarthikTunga2013-10-162-2/+2
| | |
| * | Implementing --config argument in the scriptsKarthikTunga2013-10-162-7/+10
| | |
| * | Merge branch 'master' of https://github.com/apache/incubator-sparkKarthikTunga2013-10-15159-1367/+5322
| |\ \ | | | | | | | | | | | | Updating local branch
| * | | SPARK-627 - reading --config argumentKarthikTunga2013-10-152-0/+33
| | | |
* | | | Merge pull request #67 from kayousterhout/remove_tslMatei Zaharia2013-10-179-105/+63
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed TaskSchedulerListener interface. The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function). @mateiz - is there a reason for this interface that I'm missing?
| * | | Fixed unit testsKay Ousterhout2013-10-162-25/+26
| | | |
| * | | Removed TaskSchedulerListener interface.Kay Ousterhout2013-10-167-80/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interface was used only by the DAG scheduler (so it wasn't necessary to define the additional interface), and the naming makes it very confusing when reading the code (because "listener" was used to describe the DAG scheduler, rather than SparkListeners, which implement a nearly-identical interface but serve a different function).
* | | | Merge pull request #65 from tgravescs/fixYarnMatei Zaharia2013-10-161-2/+2
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | Fix yarn build Fix the yarn build after renaming StandAloneX to CoarseGrainedX from pull request 34.
| * | | Fix yarn buildtgravescs2013-10-161-2/+2
|/ / /
* | | Merge pull request #63 from pwendell/masterMatei Zaharia2013-10-152-4/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixing spark streaming example and a bug in examples build. - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
| * | | Fixing spark streaming example and a bug in examples build.Patrick Wendell2013-10-152-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | - Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
* | | | Merge pull request #62 from harveyfeng/masterMatei Zaharia2013-10-152-2/+5
|\ \ \ \ | |/ / / |/| | | | | | | Make TaskContext's stageId publicly accessible.
| * | | Proper formatting for SparkHadoopWriter class extensions.Harvey Feng2013-10-151-1/+3
| | | |
| * | | Fix line length > 100 chars in SparkHadoopWriterHarvey Feng2013-10-151-1/+2
| | | |
| * | | Make TaskContext's stageId publicly accessible.Harvey Feng2013-10-151-1/+1
| | | |
* | | | Merge pull request #8 from vchekan/checkpoint-ttl-restoreMatei Zaharia2013-10-152-0/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Serialize and restore spark.cleaner.ttl to savepoint In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.
| * | | | Serialize and restore spark.cleaner.ttl to savepointVadim Chekan2013-09-202-0/+6
| | | | |
* | | | | Merge pull request #34 from kayousterhout/renameMatei Zaharia2013-10-156-36/+42
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Renamed StandaloneX to CoarseGrainedX. (as suggested by @rxin here https://github.com/apache/incubator-spark/pull/14) The previous names were confusing because the components weren't just used in Standalone mode. The scheduler used for Standalone mode is called SparkDeploySchedulerBackend, so referring to the base class as StandaloneSchedulerBackend was misleading.
| * | | | | Fixed build error after merging in masterKay Ousterhout2013-10-151-1/+1
| | | | | |
| * | | | | Merge remote branch 'upstream/master' into renameKay Ousterhout2013-10-15175-1414/+5573
| |\ \ \ \ \ | | | |/ / / | | |/| | |
| * | | | | Added back fully qualified class nameKay Ousterhout2013-10-061-1/+1
| | | | | |
| * | | | | Renamed StandaloneX to CoarseGrainedX.Kay Ousterhout2013-10-046-35/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous names were confusing because the components weren't just used in Standalone mode -- in fact, the scheduler used for Standalone mode is called SparkDeploySchedulerBackend. So, the previous names were misleading.
* | | | | | Merge pull request #61 from kayousterhout/daemon_threadMatei Zaharia2013-10-157-38/+29
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | Unified daemon thread pools As requested by @mateiz in an earlier pull request, this refactors various daemon thread pools to use a set of methods in utils.scala, and also changes the thread-pool-creation methods in utils.scala to use named thread pools for improved debugging.
| * | | | | Unified daemon thread poolsKay Ousterhout2013-10-157-38/+29
|/ / / / /
* | | | | Merge pull request #59 from rxin/warningMatei Zaharia2013-10-151-5/+5
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | Bump up logging level to warning for failed tasks.
| * | | | | Bump up logging level to warning for failed tasks.Reynold Xin2013-10-141-5/+5
| | |_|_|/ | |/| | |
* | | | | Merge pull request #58 from hsaputra/update-pom-asfReynold Xin2013-10-151-1/+24
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | Update pom.xml to use version 13 of the ASF parent pom Update pom.xml to use version 13 of the ASF parent pom. Add mailingList element to pom.xml.
| * | | | Update pom.xml to use version 13 of the ASF parent pom and add mailingLists ↵Henry Saputra2013-10-141-1/+24
| | | | | | | | | | | | | | | | | | | | element.
* | | | | Merge pull request #29 from rxin/killPatrick Wendell2013-10-1450-515/+1528
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Job killing Moving https://github.com/mesos/spark/pull/935 here The high level idea is to have an "interrupted" field in TaskContext, and a task should check that flag to determine if its execution should continue. For convenience, I provide an InterruptibleIterator which wraps around a normal iterator but checks for the interrupted flag. I also provide an InterruptibleRDD that wraps around an existing RDD. As part of this pull request, I added an AsyncRDDActions class that provides a number of RDD actions that return a FutureJob (extending scala.concurrent.Future). The FutureJob can be used to kill the job execution, or waits until the job finishes. This is NOT ready for merging yet. Remaining TODOs: 1. Add unit tests 2. Add job killing functionality for local scheduler (current job killing functionality only works in cluster scheduler) ------------- Update on Oct 10, 2013: This is ready! Related future work: - Figure out how to handle the job triggered by RangePartitioner (this one is tough; might become future work) - Java API - Python API
| * | | | Merge branch 'master' of github.com:apache/incubator-spark into killReynold Xin2013-10-1453-457/+652
| |\ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
* | | | | Merge pull request #57 from aarondav/bidReynold Xin2013-10-1444-385/+544
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor BlockId into an actual type Converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Downside is, of course, that this is a very invasive change touching a lot of different files, which will inevitably lead to merge conflicts for many.
| * | | | | Address Matei's commentsAaron Davidson2013-10-148-34/+28
| | | | | |
| * | | | | Change BlockId filename to name + rest of Patrick's commentsAaron Davidson2013-10-1311-36/+39
| | | | | |