Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Undo JLine fix that turns out to only be needed when buildr is running | Matei Zaharia | 2010-11-13 | 1 | -2/+2 |
| | | | | | | | | | | on JRuby. This is quite ugly: JRuby has its own version of JLine which is older than Scala's, and JLine changed API in such a way that code written for the new version won't compile with the old one and vice versa. Sadly, this might be a reason to drop buildr, unless we can package a JRuby with it that uses the right version, or we ask people to only use the C Ruby version of buildr (which doesn't work on OS X right now!) | ||||
* | Modified project structure to work with buildr | Matei Zaharia | 2010-11-13 | 48 | -4/+20 |
| | |||||
* | Added a shuffle test with negative hash codes for some keys (this was a bug ↵ | Matei Zaharia | 2010-11-12 | 1 | -0/+11 |
| | | | | earlier) | ||||
* | Unit tests for shuffle operations. Fixes #33. | Matei Zaharia | 2010-11-12 | 1 | -0/+119 |
| | |||||
* | Added options for using an external HTTP server with LocalFileShuffle | Matei Zaharia | 2010-11-09 | 3 | -19/+41 |
| | |||||
* | Removed unnecessary collectAsMap | Matei Zaharia | 2010-11-08 | 1 | -4/+2 |
| | |||||
* | Made shuffle algorithm pluggable and added LocalFileShuffle. | Matei Zaharia | 2010-11-08 | 6 | -56/+237 |
| | |||||
* | Create output files one by one instead of at the same time in the map | Matei Zaharia | 2010-11-06 | 1 | -12/+11 |
| | | | | phase of DfsShuffle. | ||||
* | Properly set the number of output splits in DFS shuffle | Matei Zaharia | 2010-11-04 | 1 | -1/+2 |
| | |||||
* | Added groupBy function in RDD | Matei Zaharia | 2010-11-03 | 1 | -1/+9 |
| | |||||
* | Added reduceByKey, groupByKey and join operations based on combine, as | Matei Zaharia | 2010-11-03 | 4 | -56/+115 |
| | | | | | well as versions of the shuffle operations that set the number of splits automatically. | ||||
* | Fixed a bug with negative hashcodes | Matei Zaharia | 2010-11-03 | 1 | -1/+4 |
| | |||||
* | Made DFS shuffle's "reduce tasks" fetch inputs in a random order so they | Matei Zaharia | 2010-11-03 | 2 | -5/+21 |
| | | | | don't all hit the same nodes at the same time. | ||||
* | Initial work towards a simple HDFS-based shuffle. | Matei Zaharia | 2010-11-03 | 3 | -1/+155 |
| | |||||
* | 'Running on Mesos' test is now only run when MESOS_HOME is set | Matei Zaharia | 2010-11-02 | 1 | -4/+2 |
| | |||||
* | Added initial attempt at a BoundedMemoryCache | Matei Zaharia | 2010-10-24 | 1 | -0/+69 |
| | |||||
* | Added SizeEstimator class for use by caches | Matei Zaharia | 2010-10-24 | 1 | -0/+160 |
| | |||||
* | Made caching pluggable and added soft reference and weak reference caches. | Matei Zaharia | 2010-10-23 | 7 | -12/+98 |
| | |||||
* | Renamed aggregateSplit() to splitRdd(), plus some style fixes | Matei Zaharia | 2010-10-23 | 1 | -7/+14 |
| | |||||
* | Fixed a bug with scheduling of tasks that have no locality preferences. | Matei Zaharia | 2010-10-19 | 1 | -9/+26 |
| | | | | | | These tasks were being subjected to delay scheduling but then counted as having been launched on a preferred node. The solution is to have a separate queue for them and treat them as preferred during scheduling. | ||||
* | Undid some changes that Mosharaf inadvertedly committed to master. | Matei Zaharia | 2010-10-19 | 1 | -1/+1 |
| | |||||
* | Merge branch 'master' of git@github.com:mesos/spark | Mosharaf Chowdhury | 2010-10-18 | 13 | -498/+894 |
|\ | | | | | | | | | | | | | Conflicts: src/scala/spark/SparkContext.scala Using the latest one from Matei. | ||||
| * | Fixed some whitespace | Matei Zaharia | 2010-10-16 | 3 | -14/+14 |
| | | |||||
| * | Added support for generic Hadoop InputFormats and refactored textFile to | Matei Zaharia | 2010-10-16 | 2 | -28/+111 |
| | | | | | | | | use this. Closes #12. | ||||
| * | Renamed HdfsFile to HadoopFile | Matei Zaharia | 2010-10-16 | 2 | -8/+9 |
| | | |||||
| * | Simplified UnionRDD slightly and added a SparkContext.union method for ↵ | Matei Zaharia | 2010-10-16 | 2 | -28/+22 |
| | | | | | | | | efficiently union-ing a large number of RDDs | ||||
| * | Removed setSparkHome method on SparkContext in favor of having an | Matei Zaharia | 2010-10-16 | 2 | -16/+7 |
| | | | | | | | | | | optional constructor parameter, so that the scheduler is guaranteed that a Spark home has been set when it first builds its executor arg. | ||||
| * | Added the ability to specify a list of JAR files when creating a | Matei Zaharia | 2010-10-16 | 6 | -116/+244 |
| | | | | | | | | SparkContext and have the master node serve those to workers. | ||||
| * | Keep track of tasks in each job so that they can be removed when the job exits | Matei Zaharia | 2010-10-16 | 1 | -6/+12 |
| | | |||||
| * | Further clarified some code | Matei Zaharia | 2010-10-16 | 2 | -10/+22 |
| | | |||||
| * | Fixed some log messages | Matei Zaharia | 2010-10-16 | 1 | -2/+2 |
| | | |||||
| * | Bug fixes and improvements for MesosScheduler and SimpleJob | Matei Zaharia | 2010-10-16 | 3 | -25/+46 |
| | | |||||
| * | Moved Spark home detection to SparkContext and added a setSparkHome | Matei Zaharia | 2010-10-16 | 2 | -51/+81 |
| | | | | | | | | method for setting it programatically. | ||||
| * | Bug fix in passing env vars to executors | Matei Zaharia | 2010-10-16 | 1 | -1/+1 |
| | | |||||
| * | Added code so that Spark jobs can be launched from outside the Spark | Matei Zaharia | 2010-10-15 | 1 | -2/+29 |
| | | | | | | | | | | | | directory by setting SPARK_HOME and locating the executor relative to that. Entries on SPARK_CLASSPATH and SPARK_LIBRARY_PATH are also passed along to worker nodes. | ||||
| * | Moved ClassServer out of repl packaged and renamed it to HttpServer. | Matei Zaharia | 2010-10-15 | 2 | -12/+12 |
| | | |||||
| * | Abort jobs if a task fails more than a limited number of times | Matei Zaharia | 2010-10-15 | 3 | -23/+44 |
| | | |||||
| * | A couple of improvements to ReplSuite: | Matei Zaharia | 2010-10-15 | 1 | -26/+30 |
| | | | | | | | | | | - Use collect instead of toArray - Disable the "running on Mesos" test when MESOS_HOME is not set | ||||
| * | Made locality scheduling constant-time and added support for changing | Matei Zaharia | 2010-10-15 | 1 | -24/+79 |
| | | | | | | | | CPU and memory requested per task. | ||||
| * | Moved Job and SimpleJob to new files | Matei Zaharia | 2010-10-07 | 3 | -183/+206 |
| | | |||||
| * | Merge branch 'master' into matei-scheduling | Matei Zaharia | 2010-10-07 | 4 | -11/+23 |
| |\ | |||||
| * \ | Merge branch 'master' into matei-scheduling | Matei Zaharia | 2010-10-07 | 4 | -10/+21 |
| |\ \ | |||||
| * \ \ | Merge branch 'master' into matei-scheduling | Matei Zaharia | 2010-10-05 | 3 | -3/+64 |
| |\ \ \ | |||||
| * \ \ \ | Merge branch 'master' into matei-scheduling | Matei Zaharia | 2010-10-03 | 2 | -0/+2 |
| |\ \ \ \ | |||||
| * | | | | | Renamed ParallelOperation to Job | Matei Zaharia | 2010-10-03 | 1 | -42/+42 |
| | | | | | | |||||
* | | | | | | Minor cleanup in Broadcast.scala. | Mosharaf Chowdhury | 2010-10-12 | 4 | -82/+88 |
| |_|_|_|/ |/| | | | | | | | | | | | | | | Changed BroadcastTest.scala to have multiple broadcasts. | ||||
* | | | | | Added a getId method to split to force classes to specify a unique ID | Matei Zaharia | 2010-10-07 | 4 | -11/+23 |
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | for each split. This replaces the previous method of calling split.toString, which would produce different results for the same split each time it is deserialized (because the default implementation returns the Java object's address). | ||||
* | | | | got rid of unnecessary line | Justin Ma | 2010-10-07 | 1 | -1/+0 |
| | | | | |||||
* | | | | Merge branch 'master' into jtma-accumulator | Justin Ma | 2010-10-07 | 13 | -124/+372 |
|\ \ \ \ | |||||
| * | | | | Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to | Justin Ma | 2010-10-07 | 1 | -2/+11 |
| | |_|/ | |/| | | | | | | | | | | ensure that the proper keys will be generated when they cached. |