spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #542 from markhamstra/versionBump. Closes #542.	Mark Hamstra	2014-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT
*	Increase JUnit test verbosity under SBT.	Josh Rosen	2014-01-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	Upgrade junit-interface plugin from 0.9 to 0.10. I noticed that the JavaAPISuite tests didn't appear to display any output locally or under Jenkins, making it difficult to know whether they were running. This change increases the verbosity to more closely match the ScalaTest tests.
*	Removed repl-bin and updated maven build doc.	Mark Hamstra	2014-01-14	1	-10/+0
\|
*	Merge branch 'master' into graphx	Reynold Xin	2014-01-13	1	-0/+17
\|\
\| *	Merge pull request #293 from pwendell/standalone-driver	Patrick Wendell	2014-01-09	1	-0/+17
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.
\| \| *	Adding mockito to maven build	Patrick Wendell	2014-01-08	1	-0/+6
\| \| \|
\| \| *	Merge remote-tracking branch 'apache-github/master' into standalone-driver	Patrick Wendell	2014-01-08	1	-24/+6
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml
\| \| * \|	Adding unit tests and some refactoring to promote testability.	Patrick Wendell	2014-01-07	1	-0/+12
\| \| \| \|
* \| \| \|	graph -> graphx in pom.xml	Ankur Dave	2014-01-10	1	-1/+1
\| \| \| \|
* \| \| \|	Merge remote-tracking branch 'spark-upstream/master' into HEAD	Ankur Dave	2014-01-08	1	-109/+76
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| * \| \|	Add CDH Repository to Maven Build	Patrick Wendell	2014-01-08	1	-0/+5
\| \| \|/ \| \|/\|
\| * \|	Merge pull request #313 from tdas/project-refactor	Patrick Wendell	2014-01-07	1	-23/+6
\| \|\ \ \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
\| \| *	Merge remote-tracking branch 'apache/master' into project-refactor	Tathagata Das	2014-01-06	1	-49/+17
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
\| \| * \|	Added pom.xml for external projects and removed unnecessary dependencies and ↵	Tathagata Das	2013-12-31	1	-23/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	repositoris from other poms and sbt.
\| * \| \|	Merge pull request #338 from ScrapCodes/ning-upgrade	Patrick Wendell	2014-01-06	1	-1/+1
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1005 Ning upgrade
\| \| * \| \|	SPARK-1005 Ning upgrade	Prashant Sharma	2014-01-06	1	-1/+1
\| \| \| \|/ \| \| \|/\|
\| * / \|	Change protobuf version for yarn alpha back to 2.4.1	Thomas Graves	2014-01-06	1	-1/+0
\| \|/ /
\| * \|	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	Raymond Liu	2014-01-03	1	-2/+2
\| \| \|
\| * \|	Change profile name new-yarn to hadoop2.2-yarn	Raymond Liu	2014-01-03	1	-1/+1
\| \| \|
\| * \|	Fix pom for yarn code reorgnaize commit	Raymond Liu	2014-01-03	1	-46/+9
\| \| \|
\| * \|	restore core/pom.xml file modification	liguoqiang	2014-01-01	1	-5/+5
\| \| \|
\| * \|	Merge pull request #73 from falaki/ApproximateDistinctCount	Reynold Xin	2013-12-31	1	-0/+5
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
\| \| * \|	Using origin version	Hossein Falaki	2013-12-30	1	-118/+138
\| \| \|\\|
\| \| * \|	Added stream-lib dependency to Maven build	Hossein Falaki	2013-10-18	1	-0/+5
\| \| \| \|
\| * \| \|	upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final	Binh Nguyen	2013-12-24	1	-1/+1
\| \| \|/ \| \|/\|
\| * \|	Clean-up	Patrick Wendell	2013-12-16	1	-0/+1
\| \| \|
\| * \|	Cleanup	Patrick Wendell	2013-12-16	1	-6/+0
\| \| \|
\| * \|	Remove trailing slashes from repository specifications.	Patrick Wendell	2013-12-16	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The correct format is to not have a trailing slash. For me this caused non-deterministic failures due to issues fetching certain artifacts. The issue was that some of the maven caches would fail to fetch the artifact (due to the way that the artifact path was concatenated with the repository) and this short-circuited the download process in a silent way. Here is what the log output looked like: Downloading: http://repo.maven.apache.org/maven2/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.pom [WARNING] The POM for org.spark-project.akka:akka-remote_2.10:jar:2.2.3-shaded-protobuf is missing, no dependency information available This was pretty brutal to debug since there was no error message anywhere and the path looks correct as reported by the Maven log.
\| * \|	Attempt with extra repositories	Patrick Wendell	2013-12-16	1	-33/+43
\| \| \|
\| * \|	Use scala.binary.version in POMs	Mark Hamstra	2013-12-15	1	-8/+9
\| \| \|
\| * \|	Fix maven build issues in 2.10 branch	Patrick Wendell	2013-12-13	1	-0/+4
\| \| \|
\| * \|	Disabled yarn 2.2 and added a message in the sbt build	Prashant Sharma	2013-12-12	1	-30/+30
\| \| \|
\| * \|	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	1	-9/+52
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| \| * \|	Fix pom.xml for maven build	Raymond Liu	2013-12-03	1	-9/+52
\| \| \| \|
\| * \| \|	Style fixes and addressed review comments at #221	Prashant Sharma	2013-12-10	1	-9/+8
\| \| \| \|
\| * \| \|	Incorporated Patrick's feedback comment on #211 and made maven ↵	Prashant Sharma	2013-12-07	1	-51/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	build/dep-resolution atleast a bit faster.
\| * \| \|	Merge branch 'master' into scala-2.10-wip	Prashant Sharma	2013-11-25	1	-0/+5
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
\| * \| \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-14	1	-0/+6
\| \|\ \ \
\| * \ \ \	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-13	1	-45/+81
\| \|\ \ \ \
\| * \ \ \ \	Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10	Prashant Sharma	2013-10-10	1	-3/+3
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala
\| \| * \ \ \ \	Merge branch 'master' into wip-merge-master	Prashant Sharma	2013-10-08	1	-1/+2
\| \| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
\| \| * \ \ \ \ \	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-01	1	-2/+1
\| \| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| * \| \| \| \| \| \| \|	scala 2.10 requires Java 1.6,	Martin Weindel	2013-10-05	1	-3/+9
\| \|/ / / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	using Scala 2.10.3, resolved maven-scala-plugin warning
\| * \| \| \| \| \| \|	Sync with master and some build fixes	Prashant Sharma	2013-09-26	1	-1/+2
\| \|\ \ \ \ \ \ \
\| * \| \| \| \| \| \| \|	fixed maven build for scala 2.10	Prashant Sharma	2013-09-26	1	-24/+18
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	version changed 2.9.3 -> 2.10 in shell script.	Prashant Sharma	2013-09-15	1	-8/+0
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10	Prashant Sharma	2013-09-15	1	-127/+107
\| \|\ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
\| * \ \ \ \ \ \ \ \	Merged with master	Prashant Sharma	2013-09-06	1	-90/+254
\| \|\ \ \ \ \ \ \ \ \
\| * \ \ \ \ \ \ \ \ \	Merge branch 'master' into master-merge	Prashant Sharma	2013-07-12	1	-53/+19
\| \|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: README.md core/pom.xml core/src/main/scala/spark/deploy/JsonProtocol.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/master/MasterWebUI.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/deploy/worker/WorkerWebUI.scala core/src/main/scala/spark/storage/BlockManagerUI.scala core/src/main/scala/spark/util/AkkaUtils.scala pom.xml project/SparkBuild.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
\| * \| \| \| \| \| \| \| \| \| \|	Removed some unnecessary code and fixed dependencies	Prashant Sharma	2013-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|