aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #98 from aarondav/docsMatei Zaharia2013-10-221-3/+3
|\ | | | | | | Docs: Fix links to RDD API documentation
| * Docs: Fix links to RDD API documentationAaron Davidson2013-10-221-3/+3
| |
* | Merge pull request #82 from JoshRosen/map-output-tracker-refactoringMatei Zaharia2013-10-225-98/+109
|\ \ | | | | | | | | | | | | | | | | | | | | | Split MapOutputTracker into Master/Worker classes Previously, MapOutputTracker contained fields and methods that were only applicable to the master or worker instances. This commit introduces a MasterMapOutputTracker class to prevent the master-specific methods from being accessed on workers. I also renamed a few methods and made others protected/private.
| * | Unwrap a long line that actually fits.Josh Rosen2013-10-201-2/+1
| | |
| * | Fix test failures in local mode due to updateEpochJosh Rosen2013-10-201-4/+1
| | |
| * | Split MapOutputTracker into Master/Worker classes.Josh Rosen2013-10-195-98/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, MapOutputTracker contained fields and methods that were only applicable to the master or worker instances. This commit introduces a MasterMapOutputTracker class to prevent the master-specific methods from being accessed on workers. I also renamed a few methods and made others protected/private.
* | | Merge pull request #92 from tgravescs/sparkYarnFixClasspathMatei Zaharia2013-10-212-10/+29
|\ \ \ | | | | | | | | | | | | | | | | | | | | Fix the Worker to use CoarseGrainedExecutorBackend and modify classpath ... ...to be explicit about inclusion of spark.jar and app.jar. Be explicit so if there are any conflicts in packaging between spark.jar and app.jar we don't get random results due to the classpath having /*, which can including things in different order.
| * | | Fix the Worker to use CoarseGrainedExecutorBackend and modify classpath to ↵tgravescs2013-10-212-10/+29
| | | | | | | | | | | | | | | | | | | | | | | | be explicit about inclusion of spark.jar and app.jar
* | | | Merge pull request #56 from jerryshao/kafka-0.8-devMatei Zaharia2013-10-2119-116/+250
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml
| * | | | Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streamingjerryshao2013-10-1219-110/+250
| | | | |
* | | | | Merge pull request #87 from aarondav/shuffle-baseReynold Xin2013-10-2113-319/+460
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Basic shuffle file consolidation The Spark shuffle phase can produce a large number of files, as one file is created per mapper per reducer. For large or repeated jobs, this often produces millions of shuffle files, which sees extremely degredaded performance from the OS file system. This patch seeks to reduce that burden by combining multipe shuffle files into one. This PR draws upon the work of @jason-dai in https://github.com/mesos/spark/pull/669. However, it simplifies the design in order to get the majority of the gain with less overall intellectual and code burden. The vast majority of code in this pull request is a refactor to allow the insertion of a clean layer of indirection between logical block ids and physical files. This, I feel, provides some design clarity in addition to enabling shuffle file consolidation. The main goal is to produce one shuffle file per reducer per active mapper thread. This allows us to isolate the mappers (simplifying the failure modes), while still allowing us to reduce the number of mappers tremendously for large tasks. In order to accomplish this, we simply create a new set of shuffle files for every parallel task, and return the files to a pool which will be given out to the next run task. I have run some ad hoc query testing on 5 m1.xlarge EC2 nodes with 2g of executor memory and the following microbenchmark: scala> val nums = sc.parallelize(1 to 1000, 1000).flatMap(x => (1 to 1e6.toInt)) scala> def time(x: => Unit) = { val now = System.currentTimeMillis; x; System.currentTimeMillis - now } scala> (1 to 8).map(_ => time(nums.map(x => (x % 100000, 2000, x)).reduceByKey(_ + _).count) / 1000.0) For this particular workload, with 1000 mappers and 2000 reducers, I saw the old method running at around 15 minutes, with the consolidated shuffle files running at around 4 minutes. There was a very sharp increase in running time for the non-consolidated version after around 1 million total shuffle files. Below this threshold, however, there wasn't a significant difference between the two. Better performance measurement of this patch is warranted, and I plan on doing so in the near future as part of a general investigation of our shuffle file bottlenecks and performance.
| * | | | Merge ShufflePerfTester patch into shuffle block consolidationAaron Davidson2013-10-2119-490/+551
| |\ \ \ \ | |/ / / / |/| | | |
* | | | | Merge pull request #95 from aarondav/perftestReynold Xin2013-10-211-0/+0
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | Minor: Put StoragePerfTester in org/apache/
| * | | | | Put StoragePerfTester in org/apache/Aaron Davidson2013-10-211-0/+0
| | | | | |
* | | | | | Merge pull request #94 from aarondav/mesos-fixMatei Zaharia2013-10-211-2/+2
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix mesos urls This was a bug I introduced in https://github.com/apache/incubator-spark/pull/71. Previously, we explicitly removed the mesos:// part; with #71, this no longer occurs.
| * | | | | Fix mesos urlsAaron Davidson2013-10-211-2/+2
|/ / / / / | | | | | | | | | | | | | | | | | | | | This was a bug I introduced in https://github.com/apache/incubator-spark/pull/71 Previously, we explicitly removed the mesos:// part; with PR 71, this no longer occured.
* | | | | Merge pull request #88 from rxin/cleanPatrick Wendell2013-10-217-101/+113
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Made the following traits/interfaces/classes non-public: Made the following traits/interfaces/classes non-public: SparkHadoopWriter SparkHadoopMapRedUtil SparkHadoopMapReduceUtil SparkHadoopUtil PythonAccumulatorParam BlockManagerSlaveActor
| * | | | Made JobLogger public again and some minor cleanup.Reynold Xin2013-10-201-68/+54
| | | | |
| * | | | Made the following traits/interfaces/classes non-public:Reynold Xin2013-10-207-39/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SparkHadoopWriter SparkHadoopMapRedUtil SparkHadoopMapReduceUtil SparkHadoopUtil PythonAccumulatorParam JobLogger BlockManagerSlaveActor
* | | | | Merge pull request #41 from pwendell/shuffle-benchmarkPatrick Wendell2013-10-206-4/+141
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide Instrumentation for Shuffle Write Performance Shuffle write performance can have a major impact on the performance of jobs. This patch adds a few pieces of instrumentation related to shuffle writes. They are: 1. A listing of the time spent performing blocking writes for each task. This is implemented by keeping track of the aggregate delay seen by many individual writes. 2. An undocumented option `spark.shuffle.sync` which forces shuffle data to sync to disk. This is necessary for measuring shuffle performance in the absence of the OS buffer cache. 3. An internal utility which micro-benchmarks write throughput for simulated shuffle outputs. I'm going to do some performance testing on this to see whether these small timing calls add overhead. From a feature perspective, however, I consider this complete. Any feedback is appreciated.
| * | | | | Making the timing block more narrow for the syncPatrick Wendell2013-10-071-2/+2
| | | | | |
| * | | | | Minor cleanupPatrick Wendell2013-10-073-66/+45
| | | | | |
| * | | | | Perf benchmarkPatrick Wendell2013-10-071-0/+89
| | | | | |
| * | | | | Trying new approach with writesPatrick Wendell2013-10-072-7/+7
| | | | | |
| * | | | | Adding option to force sync to the filesystemPatrick Wendell2013-10-071-3/+17
| | | | | |
| * | | | | Track and report write throughput for shuffle tasks.Patrick Wendell2013-10-075-2/+57
| | | | | |
* | | | | | Merge pull request #89 from rxin/executorReynold Xin2013-10-201-20/+23
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't setup the uncaught exception handler in local mode. This avoids unit test failures for Spark streaming. java.util.concurrent.RejectedExecutionException: Task org.apache.spark.streaming.JobManager$JobHandler@38cf728d rejected from java.util.concurrent.ThreadPoolExecutor@3b69a41e[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 14] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) at org.apache.spark.streaming.JobManager.runJob(JobManager.scala:54) at org.apache.spark.streaming.Scheduler$$anonfun$generateJobs$2.apply(Scheduler.scala:108) at org.apache.spark.streaming.Scheduler$$anonfun$generateJobs$2.apply(Scheduler.scala:108) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.streaming.Scheduler.generateJobs(Scheduler.scala:108) at org.apache.spark.streaming.Scheduler$$anonfun$1.apply$mcVJ$sp(Scheduler.scala:41) at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:66) at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:34)
| * | | | | | Don't setup the uncaught exception handler in local mode.Reynold Xin2013-10-201-20/+23
| | |/ / / / | |/| | | | | | | | | | | | | | | | This avoids unit test failures for Spark streaming.
* | | | | | Merge pull request #80 from rxin/buildMatei Zaharia2013-10-203-44/+60
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | Exclusion rules for Maven build files.
| * | | | | | Exclusion rules for Maven build files.Reynold Xin2013-10-193-44/+60
| | | | | | |
* | | | | | | Merge pull request #75 from JoshRosen/block-manager-cleanupMatei Zaharia2013-10-201-292/+166
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code de-duplication in BlockManager The BlockManager has a few methods that duplicate most of their code. This pull request extracts the duplicated code into private doPut(), doGetLocal(), and doGetRemote() methods that unify the storing/reading of bytes or objects. I believe that I preserved the logic of the original code, but I'd appreciate some help in reviewing this.
| * | | | | | | Minor cleanup based on @aarondav's code review.Josh Rosen2013-10-201-11/+5
| | | | | | | |
| * | | | | | | De-duplicate code in dropOld[Non]BroadcastBlocks.Josh Rosen2013-10-191-20/+6
| | | | | | | |
| * | | | | | | Code de-duplication in put() and putBytes().Josh Rosen2013-10-191-140/+89
| | | | | | | |
| * | | | | | | De-duplication in getRemote() and getRemoteBytes().Josh Rosen2013-10-191-32/+18
| | | | | | | |
| * | | | | | | De-duplication in getLocal() and getLocalBytes().Josh Rosen2013-10-191-100/+59
| | |_|_|_|_|/ | |/| | | | |
* | | | | | | Merge pull request #84 from rxin/kill1Reynold Xin2013-10-201-21/+38
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | | | | | | | | Added documentation for setJobGroup. Also some minor cleanup in SparkContext.
| * | | | | | Updated setGroupId documentation and marked dagSchedulerSource and ↵Reynold Xin2013-10-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | blockManagerSource as private in SparkContext.
| * | | | | | Added documentation for setJobGroup. Also some minor cleanup in SparkContext.Reynold Xin2013-10-191-19/+36
| | |/ / / / | |/| | | |
* | | | | | Merge pull request #85 from rxin/cleanMatei Zaharia2013-10-201-0/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moved the top level spark package object from spark to org.apache.spark This is a pretty annoying documentation bug ...
| * | | | | | Moved the top level spark package object from spark to org.apache.sparkReynold Xin2013-10-191-0/+2
| |/ / / / /
| | | | * | Remove executorId from Task.run()Aaron Davidson2013-10-213-3/+3
| | | | | |
| | | | * | Documentation updateAaron Davidson2013-10-201-3/+1
| | | | | |
| | | | * | Close shuffle writers during failure & remove executorId from TaskContextAaron Davidson2013-10-207-15/+13
| | | | | |
| | | | * | Cleanup old shuffle file metadata from memoryAaron Davidson2013-10-202-7/+13
| | | | | |
| | | | * | Address Josh and Reynold's commentsAaron Davidson2013-10-206-18/+17
| | | | | |
| | | | * | Fix compiler errorsAaron Davidson2013-10-203-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | Whoops. Last-second changes require testing too, it seems.
| | | | * | Basic shuffle file consolidationAaron Davidson2013-10-208-14/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Spark shuffle phase can produce a large number of files, as one file is created per mapper per reducer. For large or repeated jobs, this often produces millions of shuffle files, which sees extremely degredaded performance from the OS file system. This patch seeks to reduce that burden by combining multipe shuffle files into one. This PR draws upon the work of Jason Dai in https://github.com/mesos/spark/pull/669. However, it simplifies the design in order to get the majority of the gain with less overall intellectual and code burden. The vast majority of code in this pull request is a refactor to allow the insertion of a clean layer of indirection between logical block ids and physical files. This, I feel, provides some design clarity in addition to enabling shuffle file consolidation. The main goal is to produce one shuffle file per reducer per active mapper thread. This allows us to isolate the mappers (simplifying the failure modes), while still allowing us to reduce the number of mappers tremendously for large tasks. In order to accomplish this, we simply create a new set of shuffle files for every parallel task, and return the files to a pool which will be given out to the next run task.
| | | | * | Refactor of DiskStore for shuffle file consolidationAaron Davidson2013-10-2010-269/+366
| |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main goal of this refactor was to allow the interposition of a new layer which maps logical BlockIds to physical locations other than a file with the same name as the BlockId. In particular, BlockIds will need to be mappable to chunks of files, as multiple will be stored in the same file. In order to accomplish this, the following changes have been made: - Creation of DiskBlockManager, which manages the association of logical BlockIds to physical disk locations (called FileSegments). By default, Blocks are simply mapped to physical files of the same name, as before. - The DiskStore now indirects all requests for a given BlockId through the DiskBlockManager in order to resolve the actual File location. - DiskBlockObjectWriter has been merged into BlockObjectWriter. - The Netty PathResolver has been changed to map BlockIds into FileSegments, as this codepath is the only one that uses Netty, and that is likely to remain the case. Overall, I think this refactor produces a clearer division between the logical Block paradigm and their physical on-disk location. There is now an explicit (and documented) mapping from one to the other.
* | | | | Merge pull request #83 from ewencp/pyspark-accumulator-add-methodMatei Zaharia2013-10-191-1/+12
|\ \ \ \ \ | |_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an add() method to pyspark accumulators. Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).