aboutsummaryrefslogtreecommitdiff
path: root/pom.xml
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #293 from pwendell/standalone-driverPatrick Wendell2014-01-091-0/+17
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.
| * Adding mockito to maven buildPatrick Wendell2014-01-081-0/+6
| |
| * Merge remote-tracking branch 'apache-github/master' into standalone-driverPatrick Wendell2014-01-081-24/+6
| |\ | | | | | | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml
| * | Adding unit tests and some refactoring to promote testability.Patrick Wendell2014-01-071-0/+12
| | |
* | | Add CDH Repository to Maven BuildPatrick Wendell2014-01-081-0/+5
| |/ |/|
* | Merge pull request #313 from tdas/project-refactorPatrick Wendell2014-01-071-23/+6
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
| * Merge remote-tracking branch 'apache/master' into project-refactorTathagata Das2014-01-061-49/+17
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
| * | Added pom.xml for external projects and removed unnecessary dependencies and ↵Tathagata Das2013-12-311-23/+6
| | | | | | | | | | | | repositoris from other poms and sbt.
* | | Merge pull request #338 from ScrapCodes/ning-upgradePatrick Wendell2014-01-061-1/+1
|\ \ \ | | | | | | | | | | | | SPARK-1005 Ning upgrade
| * | | SPARK-1005 Ning upgradePrashant Sharma2014-01-061-1/+1
| | |/ | |/|
* / | Change protobuf version for yarn alpha back to 2.4.1Thomas Graves2014-01-061-1/+0
|/ /
* | Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2Raymond Liu2014-01-031-2/+2
| |
* | Change profile name new-yarn to hadoop2.2-yarnRaymond Liu2014-01-031-1/+1
| |
* | Fix pom for yarn code reorgnaize commitRaymond Liu2014-01-031-46/+9
| |
* | restore core/pom.xml file modificationliguoqiang2014-01-011-5/+5
| |
* | Merge pull request #73 from falaki/ApproximateDistinctCountReynold Xin2013-12-311-0/+5
|\ \ | | | | | | | | | | | | | | | Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
| * | Using origin versionHossein Falaki2013-12-301-118/+138
| |\|
| * | Added stream-lib dependency to Maven buildHossein Falaki2013-10-181-0/+5
| | |
* | | upgrade Netty from 4.0.0.Beta2 to 4.0.13.FinalBinh Nguyen2013-12-241-1/+1
| |/ |/|
* | Clean-upPatrick Wendell2013-12-161-0/+1
| |
* | CleanupPatrick Wendell2013-12-161-6/+0
| |
* | Remove trailing slashes from repository specifications.Patrick Wendell2013-12-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The correct format is to not have a trailing slash. For me this caused non-deterministic failures due to issues fetching certain artifacts. The issue was that some of the maven caches would fail to fetch the artifact (due to the way that the artifact path was concatenated with the repository) and this short-circuited the download process in a silent way. Here is what the log output looked like: Downloading: http://repo.maven.apache.org/maven2/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.pom [WARNING] The POM for org.spark-project.akka:akka-remote_2.10:jar:2.2.3-shaded-protobuf is missing, no dependency information available This was pretty brutal to debug since there was no error message anywhere and the path *looks* correct as reported by the Maven log.
* | Attempt with extra repositoriesPatrick Wendell2013-12-161-33/+43
| |
* | Use scala.binary.version in POMsMark Hamstra2013-12-151-8/+9
| |
* | Fix maven build issues in 2.10 branchPatrick Wendell2013-12-131-0/+4
| |
* | Disabled yarn 2.2 and added a message in the sbt buildPrashant Sharma2013-12-121-30/+30
| |
* | Merge branch 'master' into akka-bug-fixPrashant Sharma2013-12-111-9/+52
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
| * | Fix pom.xml for maven buildRaymond Liu2013-12-031-9/+52
| | |
* | | Style fixes and addressed review comments at #221Prashant Sharma2013-12-101-9/+8
| | |
* | | Incorporated Patrick's feedback comment on #211 and made maven ↵Prashant Sharma2013-12-071-51/+5
| | | | | | | | | | | | build/dep-resolution atleast a bit faster.
* | | Merge branch 'master' into scala-2.10-wipPrashant Sharma2013-11-251-0/+5
|\| | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
| * | Fix Maven build for metrics-graphiteLiGuoqiang2013-11-251-0/+5
| | |
* | | Merge branch 'master' into scala-2.10Raymond Liu2013-11-141-0/+6
|\| |
| * | Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and ↵tgravescs2013-11-041-0/+6
| | | | | | | | | | | | log4j.properties to be put into hdfs.
* | | Merge branch 'master' into scala-2.10Raymond Liu2013-11-131-45/+81
|\| |
| * | Fix Maven build to use MQTT repositoryMatei Zaharia2013-10-231-0/+11
| | |
| * | Exclusion rules for Maven build files.Reynold Xin2013-10-191-44/+30
| |/
| * Update pom.xml to use version 13 of the ASF parent pom and add mailingLists ↵Henry Saputra2013-10-141-1/+24
| | | | | | | | element.
| * Merge pull request #19 from aarondav/master-zkMatei Zaharia2013-10-101-0/+11
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
| | * Standalone Scheduler fault tolerance using ZooKeeperAaron Davidson2013-09-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.
* | | Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10Prashant Sharma2013-10-101-3/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala
| * | | Merge branch 'master' into wip-merge-masterPrashant Sharma2013-10-081-1/+2
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
| | * | Merging build changes in from 0.8Patrick Wendell2013-10-051-3/+4
| | | |
| * | | Merge branch 'master' into scala-2.10Prashant Sharma2013-10-011-2/+1
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
| | * | Removed scala -optimize flag.Reynold Xin2013-09-261-1/+0
| | |/
| | * Update build version in masterPatrick Wendell2013-09-241-1/+1
| | |
* | | scala 2.10 requires Java 1.6,Martin Weindel2013-10-051-3/+9
|/ / | | | | | | using Scala 2.10.3, resolved maven-scala-plugin warning
* | Sync with master and some build fixesPrashant Sharma2013-09-261-1/+2
|\|
| * Bumping Mesos version to 0.13.0Patrick Wendell2013-09-151-1/+1
| |
* | fixed maven build for scala 2.10Prashant Sharma2013-09-261-24/+18
| |