aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Added license header and removed @author tagLuca Rosellini2014-01-072-4/+34
|
* Added ‘-i’ command line option to spark REPL.Luca Rosellini2014-01-033-3/+43
| | | | | | | We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.
* Merge pull request #1 from apache/masterLuca Rosellini2014-01-0352-1542/+785
|\ | | | | Merge latest Spark changes
| * Merge pull request #285 from colorant/yarn-refactorPatrick Wendell2014-01-0236-1226/+189
| |\ | | | | | | | | | Yarn refactor
| | * fix docs for yarnRaymond Liu2014-01-032-5/+2
| | |
| | * minor fix for loginfoRaymond Liu2014-01-031-1/+1
| | |
| | * move duplicate pom config into parent pomRaymond Liu2014-01-033-179/+84
| | |
| | * Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2Raymond Liu2014-01-0318-30/+30
| | |
| | * Add yarn/common/src/test dir in building scriptRaymond Liu2014-01-031-0/+7
| | |
| | * Fix yarn/README.mdRaymond Liu2014-01-031-6/+4
| | |
| | * Clean up unused files for yarnRaymond Liu2014-01-034-311/+0
| | |
| | * Fix pom for build yarn/2.x with yarn/common into one jarRaymond Liu2014-01-034-36/+202
| | |
| | * Use unmanaged source dir to include common yarn codeRaymond Liu2014-01-031-11/+15
| | |
| | * merge yarn/scheduler yarn/common code into one directoryRaymond Liu2014-01-033-0/+0
| | |
| | * Need to send dummy hello message to actually estabilish akka connection.Raymond Liu2014-01-032-0/+4
| | |
| | * A few clean up for yarn 2.0 codeRaymond Liu2014-01-032-8/+7
| | |
| | * Update maven build documentationRaymond Liu2014-01-032-8/+4
| | |
| | * Fix yarn/README.md and update docs/running-on-yarn.mdRaymond Liu2014-01-032-3/+1
| | |
| | * Add README for yarn modulesRaymond Liu2014-01-031-0/+16
| | |
| | * some code clean up for Yarn 2.2Raymond Liu2014-01-032-3/+3
| | |
| | * Fix pom file for scala binary versionRaymond Liu2014-01-036-8/+8
| | |
| | * Fix yarn/assemble pom fileRaymond Liu2014-01-032-0/+75
| | |
| | * Change profile name new-yarn to hadoop2.2-yarnRaymond Liu2014-01-034-4/+4
| | |
| | * Fix pom for yarn code reorgnaize commitRaymond Liu2014-01-038-535/+264
| | |
| | * Reorganize yarn related codes into sub projects to remove duplicate files.Raymond Liu2014-01-0330-957/+337
| |/
| * Merge pull request #323 from tgravescs/sparkconf_yarn_fixPatrick Wendell2014-01-0210-113/+101
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix spark on yarn after the sparkConf changes This fixes it so that spark on yarn now compiles and works after the sparkConf changes. There are also other issues I discovered along the way that are broken: - mvn builds for yarn don't assemble correctly - unset SPARK_EXAMPLES_JAR isn't handled properly anymore - I'm pretty sure spark.conf doesn't actually work as its not distributed with yarn those things can be fixed in separate pr unless others disagree.
| | * fix yarn-clientThomas Graves2014-01-022-8/+10
| | |
| | * Fix yarn build after sparkConf changesThomas Graves2014-01-0210-109/+95
| |/ |/|
| * Merge pull request #320 from kayousterhout/erroneous_failed_msgReynold Xin2014-01-022-12/+15
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove erroneous FAILED state for killed tasks. Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem. I'm not at all sure that this is the best way to fix this problem, so alternate suggestions welcome. @rxin guessing you're the right person to look at this.
| | * Remove erroneous FAILED state for killed tasks.Kay Ousterhout2014-01-022-12/+15
| |/ |/| | | | | | | | | | | | | | | | | Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem.
| * Merge pull request #297 from tdas/window-improvementPatrick Wendell2014-01-029-172/+388
| |\ | | | | | | | | | | | | | | | | | | | | | | | | Improvements to DStream window ops and refactoring of Spark's CheckpointSuite - Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located. - Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads. - Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary. - Added mapSideCombine option to combineByKeyAndWindow.
| | * Added Apache boilerplate and class docs to PartitionerAwareUnionRDD.Tathagata Das2013-12-261-3/+33
| | |
| | * Removed unncessary options from WindowedDStream.Tathagata Das2013-12-261-5/+3
| | |
| | * Merge branch 'apache-master' into window-improvementTathagata Das2013-12-2650-1548/+1841
| | |\
| | * \ Merge branch 'master' into window-improvementTathagata Das2013-12-2637-123/+465
| | |\ \
| | * | | Updated groupByKeyAndWindow to be computed incrementally, and added ↵Tathagata Das2013-12-265-12/+34
| | | | | | | | | | | | | | | | | | | | mapSideCombine to combineByKeyAndWindow.
| | * | | Fixed bug in PartitionAwareUnionRDDTathagata Das2013-12-261-6/+9
| | | | |
| | * | | Merge branch 'scheduler-update' into window-improvementTathagata Das2013-12-234-5/+32
| | |\ \ \
| | * | | | Added tests for PartitionerAwareUnionRDD in the CheckpointSuite. Refactored ↵Tathagata Das2013-12-203-170/+231
| | | | | | | | | | | | | | | | | | | | | | | | CheckpointSuite to make the tests simpler and more reliable. Added missing test for ZippedRDD.
| | * | | | Merge branch 'scheduler-update' into window-improvementTathagata Das2013-12-19306-4277/+10714
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
| | * | | | | Added flag in window operation to use partition awaare union.Tathagata Das2013-11-211-1/+3
| | | | | | |
| | * | | | | Added partitioner aware union, modified DStream.window.Tathagata Das2013-11-213-39/+94
| | | | | | |
| | * | | | | Added partition aware union to improve reduceByKeyAndWindowTathagata Das2013-11-201-2/+49
| | | | | | |
| * | | | | | Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-sparkMatei Zaharia2014-01-022-6/+1
| |\ \ \ \ \ \
| | * | | | | | Merge pull request #319 from kayousterhout/remove_error_methodReynold Xin2014-01-022-6/+1
| |/| | | | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed redundant TaskSetManager.error() function. This function was leftover from a while ago, and now just passes all calls through to the abort() function, so this commit deletes it.
| | * | | | | | Removed redundant TaskSetManager.error() function.Kay Ousterhout2014-01-022-6/+1
| |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function was leftover from a while ago, and now just passes all calls through to the abort() function, so this commit deletes it.
| * | | | | | Merge pull request #311 from tmyklebu/masterMatei Zaharia2014-01-024-15/+93
|/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-991: Report information gleaned from a Python stacktrace in the UI Scala: - Added setCallSite/clearCallSite to SparkContext and JavaSparkContext. These functions mutate a LocalProperty called "externalCallSite." - Add a wrapper, getCallSite, that checks for an externalCallSite and, if none is found, calls the usual Utils.formatSparkCallSite. - Change everything that calls Utils.formatSparkCallSite to call getCallSite instead. Except getCallSite. - Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext. Python: - Add a gruesome hack to rdd.py that inspects the traceback and guesses what you want to see in the UI. - Add a RAII wrapper around said gruesome hack that calls setCallSite/clearCallSite as appropriate. - Wire said RAII wrapper up around three calls into the Scala code. I'm not sure that I hit all the spots with the RAII wrapper. I'm also not sure that my gruesome hack does exactly what we want. One could also approach this change by refactoring runJob/submitJob/runApproximateJob to take a call site, then threading that parameter through everything that needs to know it. One might object to the pointless-looking wrappers in JavaSparkContext. Unfortunately, I can't directly access the SparkContext from Python---or, if I can, I don't know how---so I need to wrap everything that matters in JavaSparkContext. Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
| * | | | | | Make Python function/line appear in the UI.Tor Myklebust2013-12-281-11/+55
| | | | | | |
| * | | | | | Factor call site reporting out to SparkContext.Tor Myklebust2013-12-283-4/+38
| | | | | | |
* | | | | | | Merge pull request #309 from mateiz/conf2Patrick Wendell2014-01-01140-941/+1731
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-544. Migrate configuration to a SparkConf class This is still a work in progress based on Prashant and Evan's code. So far I've done the following: - Got rid of global SparkContext.globalConf - Passed SparkConf to serializers and compression codecs - Made SparkConf public instead of private[spark] - Improved API of SparkContext and SparkConf - Switched executor environment vars to be passed through SparkConf - Fixed some places that were still using system properties - Fixed some tests, though others are still failing This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation). But the API at least is hopefully ready for review. Unfortunately there was a lot of global stuff before due to a "SparkContext.globalConf" method that let you set a "default" configuration of sorts, which meant I had to make some pretty big changes.