aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* Removed meaningless typesMark Hamstra2013-08-201-1/+1
|
* Merge remote-tracking branch 'jey/hadoop-agnostic'Matei Zaharia2013-08-2029-2255/+178
|\ | | | | | | | | Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala
| * Fix Maven build with Hadoop 0.23.9Jey Kottalam2013-08-181-0/+8
| |
| * Maven build now also works with YARNJey Kottalam2013-08-161-70/+0
| |
| * Don't mark hadoop-client as 'provided'Jey Kottalam2013-08-161-1/+0
| |
| * Maven build now works with CDH hadoop-2.0.0-mr1Jey Kottalam2013-08-161-52/+0
| |
| * Initial changes to make Maven build agnostic of hadoop versionJey Kottalam2013-08-161-58/+5
| |
| * Rename HadoopWriter to SparkHadoopWriter since it's outside of our packageJey Kottalam2013-08-152-6/+6
| |
| * Fix newTaskAttemptID to work under YARNJey Kottalam2013-08-151-1/+19
| |
| * re-enable YARN supportJey Kottalam2013-08-151-1/+13
| |
| * SparkEnv isn't available this early, and not needed anywayJey Kottalam2013-08-152-25/+0
| |
| * make SparkHadoopUtil a member of SparkEnvJey Kottalam2013-08-158-26/+31
| |
| * rename HadoopMapRedUtil => SparkHadoopMapRedUtil, HadoopMapReduceUtil => ↵Jey Kottalam2013-08-155-6/+7
| | | | | | | | SparkHadoopMapReduceUtil
| * add commentJey Kottalam2013-08-151-4/+4
| |
| * dynamically detect hadoop versionJey Kottalam2013-08-152-8/+48
| |
| * remove core/src/hadoop{1,2} dirsJey Kottalam2013-08-156-104/+0
| |
| * move yarn to its own directoryJey Kottalam2013-08-1510-1864/+0
| |
* | changeGeneration --> changeEpoch renamingMark Hamstra2013-08-201-2/+2
| |
* | Renamed 'priority' to 'jobId' and assorted minor changesMark Hamstra2013-08-205-59/+60
| |
* | Merge pull request #828 from mateiz/sched-improvementsMatei Zaharia2013-08-1941-965/+1034
|\ \ | | | | | | Scheduler fixes and improvements
| * | Added unit tests for ClusterTaskSetManager, and fix a bug found withMatei Zaharia2013-08-1811-28/+396
| | | | | | | | | | | | resetting locality level after a non-local launch
| * | Added some comments on threading in scheduler codeMatei Zaharia2013-08-183-6/+35
| | |
| * | Address some review comments:Matei Zaharia2013-08-186-21/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - When a resourceOffers() call has multiple offers, force the TaskSets to consider them in increasing order of locality levels so that they get a chance to launch stuff locally across all offers - Simplify ClusterScheduler.prioritizeContainers - Add docs on the new configuration options
| * | Comment cleanup (via Kay) and some debug messagesMatei Zaharia2013-08-184-23/+16
| | |
| * | More scheduling fixes:Matei Zaharia2013-08-1811-190/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added periodic revival of offers in StandaloneSchedulerBackend - Replaced task scheduling aggression with multi-level delay scheduling in ClusterTaskSetManager - Fixed ZippedRDD preferred locations because they can't currently be process-local - Fixed some uses of hostPort
| * | Initial work towards scheduler refactoring:Matei Zaharia2013-08-1827-751/+484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Replace use of hostPort vs host in Task.preferredLocations with a TaskLocation class that contains either an executorId and a host or just a host. This is part of a bigger effort to eliminate hostPort based data structures and just use executorID, since the hostPort vs host stuff is confusing (and not checkable with static typing, leading to ugly debug code), and hostPorts are not provided by Mesos. - Replaced most hostPort-based data structures and fields as above. - Simplified ClusterTaskSetManager to deal with preferred locations in a more concise way and generally be more concise. - Updated the way ClusterTaskSetManager handles racks: instead of enqueueing a task to a separate queue for all the hosts in the rack, which would create lots of large queues, have one queue per rack name. - Removed non-local fallback stuff in ClusterScheduler that tried to launch less-local tasks on a node once the local ones were all assigned. This change didn't work because many cluster schedulers send offers for just one node at a time (even the standalone and YARN ones do so as nodes join the cluster one by one). Thus, lots of non-local tasks would be assigned even though a node with locality for them would be able to receive tasks just a short time later. - Renamed MapOutputTracker "generations" to "epochs".
* | | Merge pull request #849 from mateiz/web-fixesMatei Zaharia2013-08-192-8/+9
|\ \ \ | | | | | | | | Small fixes to web UI
| * | | Allow some wiggle room in UISuite port test and in EC2 portsMatei Zaharia2013-08-191-2/+3
| | | |
| * | | Small fixes to web UI:Matei Zaharia2013-08-192-6/+6
| |/ / | | | | | | | | | | | | | | | - Use SPARK_PUBLIC_DNS environment variable if set (for EC2) - Use a non-ephemeral port (3030 instead of 33000) by default - Updated test to use non-ephemeral port too
* | | Merge pull request #847 from rxin/rddMatei Zaharia2013-08-1921-189/+349
|\ \ \ | |/ / |/| | Allow subclasses of Product2 in all key-value related classes
| * | Code review feedback. (added tests for cogroup and substract; added more ↵Reynold Xin2013-08-193-11/+51
| | | | | | | | | | | | documentation on MutablePair)
| * | Added a test for sorting using MutablePair's.Reynold Xin2013-08-191-2/+18
| | |
| * | Made PairRDDFunctions taking only Tuple2, but made the rest of the shuffle ↵Reynold Xin2013-08-1919-91/+132
| | | | | | | | | | | | code path working with general Product2.
| * | Added the missing RDD files and cleaned up SparkContext.Reynold Xin2013-08-184-12/+126
| | |
| * | Allow subclasses of Product2 in all key-value related classes ↵Reynold Xin2013-08-1810-107/+56
| | | | | | | | | | | | (ShuffleDependency, PairRDDFunctions, etc).
* | | Merge pull request #840 from AndreSchumacher/zipeggMatei Zaharia2013-08-181-1/+8
|\ \ \ | |/ / |/| | Implementing SPARK-878 for PySpark: adding zip and egg files to context ...
| * | Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵Andre Schumacher2013-08-161-1/+8
| | | | | | | | | | | | passing it down to workers which add these to their sys.path
* | | Moved shuffle serializer setting from a constructor parameter to a ↵Reynold Xin2013-08-175-32/+51
| | | | | | | | | | | | setSerializer method in various RDDs that involve shuffle operations.
* | | Removed the mapSideCombine option in partitionBy.Reynold Xin2013-08-172-28/+6
| | |
* | | Removed the mapSideCombine option in CoGroupedRDD.Reynold Xin2013-08-171-33/+5
| | |
* | | Removed the unused shuffleId in ShuffleDependency's constructor.Reynold Xin2013-08-161-1/+0
| | |
* | | Merge pull request #839 from jegonzal/zip_partitionsMatei Zaharia2013-08-164-17/+14
|\ \ \ | | | | | | | | Currying RDD.zipPartitions
| * | | Reversing the argument order in zipPartitions to enable stronger type inference.Joseph E. Gonzalez2013-08-164-17/+14
| | |/ | |/|
* | | Use the JSON formatter from Scala library and removed dependency on lift-json.Reynold Xin2013-08-156-70/+64
| | | | | | | | | | | | It made the JSON creation slightly more complicated, but reduces one external dependency. The scala library also properly escape "/" (which lift-json doesn't).
* | | Revert "Merge pull request #834 from Daemoen/master"Reynold Xin2013-08-151-2/+1
| | | | | | | | | | | | | | | This reverts commit 230ab2722ebd399afcf64c1a131f4929f602177d, reversing changes made to 659553b21ddd7504889ce113a816c1db4a73f167.
* | | Merge pull request #834 from Daemoen/masterReynold Xin2013-08-151-1/+2
|\ \ \ | |_|/ |/| | Updated json output to allow for display of worker state
| * | Updated json output to allow for display of worker stateDaemoen2013-08-151-1/+2
| | | | | | | | | Ops teams need to ensure that the cluster is functional and performant. Having to scrape the html source for worker state won't work reliably, and will be slow. By exposing the state in the json output, ops teams are able to ensure a fully functional environment by querying for the json output and parsing for dead nodes.
* | | Merge pull request #836 from pwendell/renamePatrick Wendell2013-08-1519-64/+64
|\ \ \ | |_|/ |/| | Rename `memoryBytesToString` and `memoryMegabytesToString`
| * | Rename `memoryBytesToString` and `memoryMegabytesToString`Patrick Wendell2013-08-1519-64/+64
| | | | | | | | | | | | | | | | | | | | | These are used all over the place now and they are not specific to memory at all. memoryBytesToString --> bytesToString memoryMegabytesToString --> megabytesToString
* | | More minor UI changes including code review feedback.Reynold Xin2013-08-156-16/+39
| | |