spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix Maven build to use MQTT repository	Matei Zaharia	2013-10-23	1	-3/+3
\|
*	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	1	-1/+5
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| *	remove unused dependency	prabeesh	2013-10-17	1	-2/+0
\| \|
\| *	added mqtt adapter library dependencies	prabeesh	2013-10-16	1	-1/+7
\| \|
* \|	Merge pull request #56 from jerryshao/kafka-0.8-dev	Matei Zaharia	2013-10-21	1	-3/+6
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml
\| * \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming	jerryshao	2013-10-12	1	-3/+6
\| \| \|
* \| \|	Merge pull request #66 from shivaram/sbt-assembly-deps	Matei Zaharia	2013-10-18	1	-3/+10
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SBT target to assemble dependencies This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies. So the work flow that should work now would be something like ``` ./sbt/sbt spark # Doing this once should suffice ## Make changes ./sbt/sbt compile ./sbt/sbt test or ./spark-shell ```
\| * \| \|	Rename SBT target to assemble-deps.	Shivaram Venkataraman	2013-10-16	1	-5/+5
\| \| \| \|
\| * \| \|	Merge branch 'master' of https://github.com/apache/incubator-spark into ↵	Shivaram Venkataraman	2013-10-15	1	-10/+26
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly-deps
\| * \| \|	Add a comment and exclude tools	Shivaram Venkataraman	2013-10-11	1	-1/+2
\| \| \| \|
\| * \| \|	Add new SBT target for dependency assembly	Shivaram Venkataraman	2013-10-09	1	-1/+7
\| \| \|/ \| \|/\|
* \| \|	Fixing spark streaming example and a bug in examples build.	Patrick Wendell	2013-10-15	1	-0/+1
\| \|/ \|/\| \| \| \| \| \| \| \| \|	- Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
* \|	Merge pull request #19 from aarondav/master-zk	Matei Zaharia	2013-10-10	1	-0/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
\| * \|	Standalone Scheduler fault tolerance using ZooKeeper	Aaron Davidson	2013-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.
* \| \|	Merge pull request #31 from sundeepn/branch-0.8	Reynold Xin	2013-10-07	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit 023e3fdf008b3194a36985a07923df9aaf64e520) Signed-off-by: Reynold Xin <rxin@apache.org>
* \| \|	ask ivy/sbt to check local maven repo under ~/.m2	Du Li	2013-10-01	1	-0/+3
\| \| \|
* \| \|	Removed scala -optimize flag.	Reynold Xin	2013-09-26	1	-1/+1
\|/ /
* \|	Merge pull request #930 from holdenk/master	Reynold Xin	2013-09-26	1	-1/+1
\|\ \ \| \| \| \| \| \|	Add mapPartitionsWithIndex
\| * \|	Fix build on ubuntu	Holden Karau	2013-09-14	1	-1/+1
\| \| \|
* \| \|	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\| \| \|
* \| \|	Bumping Mesos version to 0.13.0	Patrick Wendell	2013-09-15	1	-1/+1
\|/ /
* \|	Merge pull request #919 from mateiz/jets3t	Patrick Wendell	2013-09-11	1	-0/+1
\|\ \ \| \| \| \| \| \|	Add explicit jets3t dependency, which is excluded in hadoop-client
\| * \|	Add explicit jets3t dependency, which is excluded in hadoop-client	Matei Zaharia	2013-09-10	1	-0/+1
\| \| \|
* \| \|	Fix HDFS access bug with assembly build.	Patrick Wendell	2013-09-10	1	-0/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to this change in HDFS: https://issues.apache.org/jira/browse/HADOOP-7549 there is a bug when using the new assembly builds. The symptom is that any HDFS access results in an exception saying "No filesystem for scheme 'hdfs'". This adds a merge strategy in the assembly build which fixes the problem.
* \|	Merge pull request #906 from pwendell/ganglia-sink	Patrick Wendell	2013-09-08	1	-0/+1
\|\ \ \| \| \| \| \| \|	Clean-up of Metrics Code/Docs and Add Ganglia Sink
\| * \|	Ganglia sink	Patrick Wendell	2013-09-08	1	-0/+1
\| \|/
* \|	Merge pull request #908 from pwendell/master	Matei Zaharia	2013-09-08	1	-1/+7
\|\ \ \| \| \| \| \| \|	Fix target JVM version in scala build
\| * \|	Fix target JVM version in scala build	Patrick Wendell	2013-09-08	1	-1/+7
\| \|/
* /	Minor YARN build cleanups	Jey Kottalam	2013-09-06	1	-2/+2
\|/
*	Add Apache parent POM	Matei Zaharia	2013-09-02	1	-0/+5
\|
*	Fix some URLs	Matei Zaharia	2013-09-01	1	-2/+2
\|
*	Initial work to rename package to org.apache.spark	Matei Zaharia	2013-09-01	1	-4/+8
\|
*	Update Maven build to create assemblies expected by new scripts	Matei Zaharia	2013-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly
*	Provide more memory for tests	Matei Zaharia	2013-08-29	1	-1/+1
\|
*	Change build and run instructions to use assemblies	Matei Zaharia	2013-08-29	1	-20/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
*	Revert "Merge pull request #841 from rxin/json"	Reynold Xin	2013-08-26	1	-0/+1
\| \| \| \| \|	This reverts commit 1fb1b0992838c8cdd57eec45793e67a0490f1a52, reversing changes made to c69c48947d5102c81a9425cb380d861c3903685c.
*	Fix SBT generation of IDE project files	Jey Kottalam	2013-08-23	1	-5/+12
\|
*	Re-add removed dependency on 'commons-daemon'	Jey Kottalam	2013-08-22	1	-0/+1
\| \| \| \|	Fixes SBT build under Hadoop 0.23.9 and 2.0.4
*	Merge pull request #855 from jey/update-build-docs	Matei Zaharia	2013-08-22	1	-4/+3
\|\ \| \| \| \|	Update build docs
\| *	Remove references to unsupported Hadoop versions	Jey Kottalam	2013-08-21	1	-4/+3
\| \|
* \|	Merge pull request #854 from markhamstra/pomUpdate	Matei Zaharia	2013-08-22	1	-4/+1
\|\ \ \| \|/ \|/\|	Synced sbt and maven builds to use the same dependencies, etc.
\| *	Synced sbt and maven builds	Mark Hamstra	2013-08-21	1	-4/+1
\| \|
* \|	Downgraded default build hadoop version to 1.0.4.	Reynold Xin	2013-08-21	1	-1/+1
\|/
*	Merge remote-tracking branch 'jey/hadoop-agnostic'	Matei Zaharia	2013-08-20	1	-41/+35
\|\ \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/spark/PairRDDFunctions.scala
\| *	Update SBT build to use simpler fix for Hadoop 0.23.9	Jey Kottalam	2013-08-19	1	-11/+2
\| \|
\| *	Rename YARN build flag to SPARK_WITH_YARN	Jey Kottalam	2013-08-16	1	-5/+7
\| \|
\| *	Fix SBT build under Hadoop 0.23.x	Jey Kottalam	2013-08-16	1	-0/+11
\| \|
\| *	Fix repl/assembly when YARN enabled	Jey Kottalam	2013-08-16	1	-3/+4
\| \|
\| *	Allow make-distribution.sh to specify Hadoop version used	Jey Kottalam	2013-08-16	1	-6/+22
\| \|
\| *	Update default version of Hadoop to 1.2.1	Jey Kottalam	2013-08-15	1	-1/+1
\| \|