spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	SPARK-1121: Include avro for yarn-alpha builds	Patrick Wendell	2014-03-02	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lets us explicitly include Avro based on a profile for 0.23.X builds. It makes me sad how convoluted it is to express this logic in Maven. @tgraves and @sryza curious if this works for you. I'm also considering just reverting to how it was before. The only real problem was that Spark advertised a dependency on Avro even though it only really depends transitively on Avro through other deps. Author: Patrick Wendell <pwendell@gmail.com> Closes #49 from pwendell/avro-build-fix and squashes the following commits: 8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
*	SPARK-1084.2 (resubmitted)	Sean Owen	2014-03-02	1	-33/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	(Ported from https://github.com/apache/incubator-spark/pull/650 ) This adds one more change though, to fix the scala version warning introduced by json4s recently. Author: Sean Owen <sowen@cloudera.com> Closes #32 from srowen/SPARK-1084.2 and squashes the following commits: 9240abd [Sean Owen] Avoid scala version conflict in scalap induced by json4s dependency 1561cec [Sean Owen] Remove "exclude *" dependencies that are causing Maven warnings, and that are apparently unneeded anyway
*	Remove remaining references to incubation	Patrick Wendell	2014-03-02	1	-15/+15
\| \| \| \| \| \| \| \| \| \|	This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier. Author: Patrick Wendell <pwendell@gmail.com> Closes #51 from pwendell/tlp and squashes the following commits: d553b1b [Patrick Wendell] Remove remaining references to incubation
*	Update io.netty from 4.0.13 Final to 4.0.17.Final	Binh Nguyen	2014-03-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This update contains a lot of bug fixes and some new perf improvements. It is also binary compatible with the current 4.0.13.Final For more information: http://netty.io/news/2014/02/25/4-0-17-Final.html Author: Binh Nguyen <ngbinh@gmail.com> Author: Binh Nguyen <ngbinh@gmail.com> Closes #41 from ngbinh/master and squashes the following commits: a9498f4 [Binh Nguyen] update io.netty to 4.0.17.Final
*	SPARK 1084.1 (resubmitted)	Sean Owen	2014-02-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Ported from https://github.com/apache/incubator-spark/pull/637 ) Author: Sean Owen <sowen@cloudera.com> Closes #31 from srowen/SPARK-1084.1 and squashes the following commits: 6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it f35b833 [Sean Owen] Fix two misc javadoc problems 254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit 5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates 007762b [Sean Owen] Remove dead scaladoc links b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>
*	SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set	Prashant Sharma	2014-02-26	1	-21/+0
\| \| \| \| \| \| \| \| \|	Author: Prashant Sharma <prashant.s@imaginea.com> Closes #6 from ScrapCodes/SPARK-1121/avro-dep-fix and squashes the following commits: 9b29e34 [Prashant Sharma] Review feedback on PR 46ed2ad [Prashant Sharma] SPARK-1121-Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set
*	For SPARK-1082, Use Curator for ZK interaction in standalone cluster	Raymond Liu	2014-02-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Author: Raymond Liu <raymond.liu@intel.com> Closes #611 from colorant/curator and squashes the following commits: 7556aa1 [Raymond Liu] Address review comments af92e1f [Raymond Liu] Fix coding style 964f3c2 [Raymond Liu] Ignore NodeExists exception 6df2966 [Raymond Liu] Rewrite zookeeper client code with curator
*	SPARK-1071: Tidy logging strategy and use of log4j	Sean Owen	2014-02-23	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prompted by a recent thread on the mailing list, I tried and failed to see if Spark can be made independent of log4j. There are a few cases where control of the underlying logging is pretty useful, and to do that, you have to bind to a specific logger. Instead I propose some tidying that leaves Spark's use of log4j, but gets rid of warnings and should still enable downstream users to switch. The idea is to pipe everything (except log4j) through SLF4J, and have Spark use SLF4J directly when logging, and where Spark needs to output info (REPL and tests), bind from SLF4J to log4j. This leaves the same behavior in Spark. It means that downstream users who want to use something except log4j should: - Exclude dependencies on log4j, slf4j-log4j12 from Spark - Include dependency on log4j-over-slf4j - Include dependency on another logger X, and another slf4j-X - Recreate any log config that Spark does, that is needed, in the other logger's config That sounds about right. Here are the key changes: - Include the jcl-over-slf4j shim everywhere by depending on it in core. - Exclude dependencies on commons-logging from third-party libraries. - Include the jul-to-slf4j shim everywhere by depending on it in core. - Exclude slf4j-* dependencies from third-party libraries to prevent collision or warnings - Added missing slf4j-log4j12 binding to GraphX, Bagel module tests And minor/incidental changes: - Update to SLF4J 1.7.5, which happily matches Hadoop 2’s version and is a recommended update over 1.7.2 - (Remove a duplicate HBase dependency declaration in SparkBuild.scala) - (Remove a duplicate mockito dependency declaration that was causing warnings and bugging me) Author: Sean Owen <sowen@cloudera.com> Closes #570 from srowen/SPARK-1071 and squashes the following commits: 52eac9f [Sean Owen] Add slf4j-over-log4j12 dependency to core (non-test) and remove it from things that depend on core. 77a7fa9 [Sean Owen] SPARK-1071: Tidy logging strategy and use of log4j
*	Merge pull request #542 from markhamstra/versionBump. Closes #542.	Mark Hamstra	2014-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT
*	Increase JUnit test verbosity under SBT.	Josh Rosen	2014-01-25	1	-1/+1
\| \| \| \| \| \| \| \| \|	Upgrade junit-interface plugin from 0.9 to 0.10. I noticed that the JavaAPISuite tests didn't appear to display any output locally or under Jenkins, making it difficult to know whether they were running. This change increases the verbosity to more closely match the ScalaTest tests.
*	Removed repl-bin and updated maven build doc.	Mark Hamstra	2014-01-14	1	-10/+0
\|
*	Merge branch 'master' into graphx	Reynold Xin	2014-01-13	1	-0/+17
\|\
\| *	Merge pull request #293 from pwendell/standalone-driver	Patrick Wendell	2014-01-09	1	-0/+17
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.
\| \| *	Adding mockito to maven build	Patrick Wendell	2014-01-08	1	-0/+6
\| \| \|
\| \| *	Merge remote-tracking branch 'apache-github/master' into standalone-driver	Patrick Wendell	2014-01-08	1	-24/+6
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml
\| \| * \|	Adding unit tests and some refactoring to promote testability.	Patrick Wendell	2014-01-07	1	-0/+12
\| \| \| \|
* \| \| \|	graph -> graphx in pom.xml	Ankur Dave	2014-01-10	1	-1/+1
\| \| \| \|
* \| \| \|	Merge remote-tracking branch 'spark-upstream/master' into HEAD	Ankur Dave	2014-01-08	1	-109/+76
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| * \| \|	Add CDH Repository to Maven Build	Patrick Wendell	2014-01-08	1	-0/+5
\| \| \|/ \| \|/\|
\| * \|	Merge pull request #313 from tdas/project-refactor	Patrick Wendell	2014-01-07	1	-23/+6
\| \|\ \ \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
\| \| *	Merge remote-tracking branch 'apache/master' into project-refactor	Tathagata Das	2014-01-06	1	-49/+17
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
\| \| * \|	Added pom.xml for external projects and removed unnecessary dependencies and ↵	Tathagata Das	2013-12-31	1	-23/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	repositoris from other poms and sbt.
\| * \| \|	Merge pull request #338 from ScrapCodes/ning-upgrade	Patrick Wendell	2014-01-06	1	-1/+1
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1005 Ning upgrade
\| \| * \| \|	SPARK-1005 Ning upgrade	Prashant Sharma	2014-01-06	1	-1/+1
\| \| \| \|/ \| \| \|/\|
\| * / \|	Change protobuf version for yarn alpha back to 2.4.1	Thomas Graves	2014-01-06	1	-1/+0
\| \|/ /
\| * \|	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	Raymond Liu	2014-01-03	1	-2/+2
\| \| \|
\| * \|	Change profile name new-yarn to hadoop2.2-yarn	Raymond Liu	2014-01-03	1	-1/+1
\| \| \|
\| * \|	Fix pom for yarn code reorgnaize commit	Raymond Liu	2014-01-03	1	-46/+9
\| \| \|
\| * \|	restore core/pom.xml file modification	liguoqiang	2014-01-01	1	-5/+5
\| \| \|
\| * \|	Merge pull request #73 from falaki/ApproximateDistinctCount	Reynold Xin	2013-12-31	1	-0/+5
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
\| \| * \|	Using origin version	Hossein Falaki	2013-12-30	1	-118/+138
\| \| \|\\|
\| \| * \|	Added stream-lib dependency to Maven build	Hossein Falaki	2013-10-18	1	-0/+5
\| \| \| \|
\| * \| \|	upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final	Binh Nguyen	2013-12-24	1	-1/+1
\| \| \|/ \| \|/\|
\| * \|	Clean-up	Patrick Wendell	2013-12-16	1	-0/+1
\| \| \|
\| * \|	Cleanup	Patrick Wendell	2013-12-16	1	-6/+0
\| \| \|
\| * \|	Remove trailing slashes from repository specifications.	Patrick Wendell	2013-12-16	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The correct format is to not have a trailing slash. For me this caused non-deterministic failures due to issues fetching certain artifacts. The issue was that some of the maven caches would fail to fetch the artifact (due to the way that the artifact path was concatenated with the repository) and this short-circuited the download process in a silent way. Here is what the log output looked like: Downloading: http://repo.maven.apache.org/maven2/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.pom [WARNING] The POM for org.spark-project.akka:akka-remote_2.10:jar:2.2.3-shaded-protobuf is missing, no dependency information available This was pretty brutal to debug since there was no error message anywhere and the path looks correct as reported by the Maven log.
\| * \|	Attempt with extra repositories	Patrick Wendell	2013-12-16	1	-33/+43
\| \| \|
\| * \|	Use scala.binary.version in POMs	Mark Hamstra	2013-12-15	1	-8/+9
\| \| \|
\| * \|	Fix maven build issues in 2.10 branch	Patrick Wendell	2013-12-13	1	-0/+4
\| \| \|
\| * \|	Disabled yarn 2.2 and added a message in the sbt build	Prashant Sharma	2013-12-12	1	-30/+30
\| \| \|
\| * \|	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	1	-9/+52
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| \| * \|	Fix pom.xml for maven build	Raymond Liu	2013-12-03	1	-9/+52
\| \| \| \|
\| * \| \|	Style fixes and addressed review comments at #221	Prashant Sharma	2013-12-10	1	-9/+8
\| \| \| \|
\| * \| \|	Incorporated Patrick's feedback comment on #211 and made maven ↵	Prashant Sharma	2013-12-07	1	-51/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	build/dep-resolution atleast a bit faster.
\| * \| \|	Merge branch 'master' into scala-2.10-wip	Prashant Sharma	2013-11-25	1	-0/+5
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
\| * \| \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-14	1	-0/+6
\| \|\ \ \
\| * \ \ \	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-13	1	-45/+81
\| \|\ \ \ \
\| * \ \ \ \	Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10	Prashant Sharma	2013-10-10	1	-3/+3
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala
\| \| * \ \ \ \	Merge branch 'master' into wip-merge-master	Prashant Sharma	2013-10-08	1	-1/+2
\| \| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
\| \| * \ \ \ \ \	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-01	1	-2/+1
\| \| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala