aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' into scala-2.10Raymond Liu2013-11-13259-5146/+10776
|\
| * Merge pull request #148 from squito/include_appIdReynold Xin2013-11-073-2/+22
| |\ | | | | | | | | | | | | | | | | | | | | | Include appId in executor cmd line args add the appId back into the executor cmd line args. I also made a pretty lame regression test, just to make sure it doesn't get dropped in the future. not sure it will run on the build server, though, b/c `ExecutorRunner.buildCommandSeq()` expects to be abel to run the scripts in `bin`.
| | * fix formattingImran Rashid2013-11-071-3/+5
| | |
| | * very basic regression test to make sure appId doesnt get dropped in futureImran Rashid2013-11-071-0/+18
| | |
| | * include the appid in the cmd line arguments to ExecutorsImran Rashid2013-11-072-2/+2
| | |
| * | Merge pull request #23 from jerryshao/multi-userReynold Xin2013-11-065-379/+417
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | Add Spark multi-user support for standalone mode and Mesos This PR add multi-user support for Spark both standalone mode and Mesos (coarse and fine grained ) mode, user can specify the user name who submit app through environment variable `SPARK_USER` or use default one. Executor will communicate with Hadoop using specified user name. Also I fixed one bug in JobLogger when different user wrote job log to specified folder which has no right file permission. I separate previous [PR750](https://github.com/mesos/spark/pull/750) into two PRs, in this PR I only solve multi-user support problem. I will try to solve security auth problem in subsequent PR because security auth is a complicated problem especially for Shark Server like long-run app (both Kerberos TGT and HDFS delegation token should be renewed or re-created through app's run time).
| | * Add Spark multi-user support for standalone mode and Mesosjerryshao2013-11-075-379/+417
| |/
| * Merge pull request #144 from liancheng/runjob-cleanReynold Xin2013-11-061-2/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed unused return value in SparkContext.runJob Return type of this `runJob` version is `Unit`: def runJob[T, U: ClassManifest]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], allowLocal: Boolean, resultHandler: (Int, U) => Unit) { ... } It's obviously unnecessary to "return" `result`.
| | * Removed unused return value in SparkContext.runJobLian, Cheng2013-11-061-2/+1
| | |
| * | Merge pull request #145 from aarondav/sls-fixReynold Xin2013-11-061-1/+1
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | Attempt to fix SparkListenerSuite breakage Could not reproduce locally, but this test could've been flaky if the build machine was too fast, due to typo. (index 0 is intentionally slowed down to ensure total time is >= 1 ms) This should be merged into branch-0.8 as well.
| | * Attempt to fix SparkListenerSuite breakageAaron Davidson2013-11-061-1/+1
| |/ | | | | | | | | Could not reproduce locally, but this test could've been flaky if the build machine was too fast.
| * Merge pull request #143 from rxin/scheduler-hangReynold Xin2013-11-051-3/+11
| |\ | | | | | | | | | | | | | | | | | | | | | Ignore a task update status if the executor doesn't exist anymore. Otherwise if the scheduler receives a task update message when the executor's been removed, the scheduler would hang. It is pretty hard to add unit tests for these right now because it is hard to mock the cluster scheduler. We should do that once @kayousterhout finishes merging the local scheduler and the cluster scheduler.
| | * Ignore a task update status if the executor doesn't exist anymore.Reynold Xin2013-11-051-3/+11
| |/
| * Merge pull request #142 from liancheng/dagscheduler-pattern-matchingReynold Xin2013-11-051-7/+6
| |\ | | | | | | | | | | | | | | | Using case class deep match to simplify code in DAGScheduler.processEvent Since all `XxxEvent` pushed in `DAGScheduler.eventQueue` are case classes, deep pattern matching is more convenient to extract event object components.
| | * Using compact case class pattern matching syntax to simplify code in ↵Lian, Cheng2013-11-051-7/+6
| |/ | | | | | | DAGScheduler.processEvent
| * Merge pull request #139 from aarondav/shuffle-nextReynold Xin2013-11-043-36/+4
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Never store shuffle blocks in BlockManager After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use within BlockManager (they had a no-arg constructor!). This patch completely eliminates them, saving us around 100-150 bytes per shuffle block. The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding state saved by the MapOutputTracker. Note: This should *not* be merged directly into 0.8.0 -- see #138
| | * Never store shuffle blocks in BlockManagerAaron Davidson2013-11-043-36/+4
| | | | | | | | | | | | | | | | | | | | | | | | After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use within BlockManager (they had a no-arg constructor!). This patch completely eliminates them, saving us around 100-150 bytes per shuffle block. The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding state saved by the MapOutputTracker.
| * | Merge pull request #128 from shimingfei/joblogger-docReynold Xin2013-11-042-26/+119
| |/ | | | | | | | | | | | | | | | | | | | | add javadoc to JobLogger, and some small fix against Spark-941 add javadoc to JobLogger, output more info for RDD, modify recordStageDepGraph to avoid output duplicate stage dependency information (cherry picked from commit 518cf22eb2436d019e4f7087a38080ad4a20df58) Signed-off-by: Reynold Xin <rxin@apache.org>
| * Merge pull request #130 from aarondav/shuffleReynold Xin2013-11-0411-110/+333
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory-optimized shuffle file consolidation Reduces overhead of each shuffle block for consolidation from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes. Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation. This is accomplished by replacing the map from ShuffleBlockId to FileSegment (i.e., block id to where it's located), which had high overhead due to being a gigantic, timestamped, concurrent map with a more space-efficient structure. Namely, the following are introduced (I have omitted the word "Shuffle" from some names for clarity): **ShuffleFile** - there is one ShuffleFile per consolidated shuffle file on disk. We store an array of offsets into the physical shuffle file for each ShuffleMapTask that wrote into the file. This is sufficient to reconstruct FileSegments for mappers that are in the file. **FileGroup** - contains a set of ShuffleFiles, one per reducer, that a MapTask can use to write its output. There is one FileGroup created per _concurrent_ MapTask. The FileGroup contains an array of the mapIds that have been written to all files in the group. The positions of elements in this array map directly onto the positions in each ShuffleFile's offsets array. In order to locate the FileSegment associated with a BlockId, we have another structure which maps each reducer to the set of ShuffleFiles that were created for it. (There will be as many ShuffleFiles per reducer as there are FileGroups.) To lookup a given ShuffleBlockId (shuffleId, reducerId, mapId), we thus search through all ShuffleFiles associated with that reducer. As a time optimization, we ensure that FileGroups are only reused for MapTasks with monotonically increasing mapIds. This allows us to perform a binary search to locate a mapId inside a group, and also enables potential future optimization (based on the usual monotonic access order).
| | * Minor cleanup in ShuffleBlockManagerAaron Davidson2013-11-041-4/+4
| | |
| | * Refactor ShuffleBlockManager to reduce public interfaceAaron Davidson2013-11-043-178/+123
| | | | | | | | | | | | | | | | | | | | | - ShuffleBlocks has been removed and replaced by ShuffleWriterGroup. - ShuffleWriterGroup no longer contains a reference to a ShuffleFileGroup. - ShuffleFile has been removed and its contents are now within ShuffleFileGroup. - ShuffleBlockManager.forShuffle has been replaced by a more stateful forMapTask.
| | * Add javadoc and remove unused codeAaron Davidson2013-11-032-1/+1
| | |
| | * Clean up test files properlyAaron Davidson2013-11-031-5/+9
| | | | | | | | | | | | | | | | | | | | | For some reason, even calling java.nio.Files.createTempDirectory().getFile.deleteOnExit() does not delete the directory on exit. Guava's analagous function seems to work, however.
| | * use OpenHashMap, remove monotonicity requirement, fix failure bugAaron Davidson2013-11-034-41/+26
| | |
| | * Address Reynold's commentsAaron Davidson2013-11-031-12/+16
| | |
| | * Fix test breakageAaron Davidson2013-11-031-1/+1
| | |
| | * Add documentation and address other commentsAaron Davidson2013-11-032-26/+35
| | |
| | * Fix weird bug with specialized PrimitiveVectorAaron Davidson2013-11-031-1/+5
| | |
| | * Address minor commentsAaron Davidson2013-11-033-8/+9
| | |
| | * Memory-optimized shuffle file consolidationAaron Davidson2013-11-038-77/+348
| |/ | | | | | | | | | | | | | | | | | | Overhead of each shuffle block for consolidation has been reduced from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes. Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation.
| * Merge pull request #70 from rxin/hash1Reynold Xin2013-11-039-7/+1108
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fast, memory-efficient hash set, hash table implementations optimized for primitive data types. This pull request adds two hash table implementations optimized for primitive data types. For primitive types, the new hash tables are much faster than the current Spark AppendOnlyMap (3X faster - note that the current AppendOnlyMap is already much better than the Java map) while uses much less space (1/4 of the space). Details: This PR first adds a open hash set implementation (OpenHashSet) optimized for primitive types (using Scala's specialization feature). This OpenHashSet is designed to serve as building blocks for more advanced structures. It is currently used to build the following two hash tables, but can be used in the future to build multi-valued hash tables as well (GraphX has this use case). Note that there are some peculiarities in the code for working around some Scala compiler bugs. Building on top of OpenHashSet, this PR adds two different hash tables implementations: 1. OpenHashSet: for nullable keys, optional specialization for primitive values 2. PrimitiveKeyOpenHashMap: for primitive keys that are not nullable, and optional specialization for primitive values I tested the update speed of these two implementations using the changeValue function (which is what Aggregator and cogroup would use). Runtime relative to AppendOnlyMap for inserting 10 million items: Int to Int: ~30% java.lang.Integer to java.lang.Integer: ~100% Int to java.lang.Integer: ~50% java.lang.Integer to Int: ~85%
| | * Code review feedback.Reynold Xin2013-11-037-25/+100
| | |
| | * Fixed a bug that uses twice amount of memory for the primitive arrays due to ↵Reynold Xin2013-11-029-30/+38
| | | | | | | | | | | | | | | | | | a scala compiler bug. Also addressed Matei's code review comment.
| | * Merge branch 'master' into hash1Reynold Xin2013-11-02147-3776/+4261
| | |\ | | |/ | |/|
| * | Merge pull request #133 from Mistobaan/link_fixReynold Xin2013-11-021-1/+1
| |\ \ | | | | | | | | | | | | update default github
| | * | update default githubFabrizio (Misto) Milo2013-11-011-1/+1
| | | |
| * | | Merge pull request #134 from rxin/readmeReynold Xin2013-11-021-1/+1
| |\ \ \ | | | | | | | | | | | | | | | Fixed a typo in Hadoop version in README.
| | * | | Fixed a typo in Hadoop version in README.Reynold Xin2013-11-021-1/+1
| |/ / /
| * | | Merge pull request #132 from Mistobaan/doc_fixReynold Xin2013-11-011-1/+1
| |\ \ \ | | |/ / | |/| | | | | | fix persistent-hdfs
| | * | fix persistent-hdfsFabrizio (Misto) Milo2013-11-011-1/+1
| |/ /
| * | Merge pull request #129 from velvia/2013-11/document-local-urisMatei Zaharia2013-11-012-2/+15
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | Document & finish support for local: URIs Review all the supported URI schemes for addJar / addFile to the Cluster Overview page. Add support for local: URI to addFile.
| | * | Add local: URI support to addFile as wellEvan Chan2013-11-011-1/+2
| | | |
| | * | Document all the URIs for addJar/addFileEvan Chan2013-11-011-1/+13
| |/ /
| * | Merge pull request #117 from stephenh/avoid_concurrent_modification_exceptionMatei Zaharia2013-10-302-3/+12
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle ConcurrentModificationExceptions in SparkContext init. System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.
| | * | Avoid match errors when filtering for spark.hadoop settings.Stephen Haberman2013-10-301-2/+4
| | | |
| | * | Use Properties.clone() instead.Stephen Haberman2013-10-291-5/+2
| | | |
| | * | Handle ConcurrentModificationExceptions in SparkContext init.Stephen Haberman2013-10-272-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.
| * | | Merge pull request #126 from kayousterhout/local_fixMatei Zaharia2013-10-301-1/+1
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Fixed incorrect log message in local scheduler This change is especially relevant at the moment, because some users are seeing this failure, and the log message is misleading/incorrect (because for the tests, the max failures is set to 0, not 4)
| | * | | Fixed incorrect log message in local schedulerKay Ousterhout2013-10-301-1/+1
| | | | |
| * | | | Merge pull request #124 from tgravescs/sparkHadoopUtilFixMatei Zaharia2013-10-308-38/+43
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SparkHadoopUtil out of SparkEnv (jira SPARK-886) Having the logic to initialize the correct SparkHadoopUtil in SparkEnv prevents it from being used until after the SparkContext is initialized. This causes issues like https://spark-project.atlassian.net/browse/SPARK-886. It also makes it hard to use in singleton objects. For instance I want to use it in the security code.