spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix pom.xml for maven build	Raymond Liu	2013-12-03	1	-8/+1
\|
*	Another set of changes to remove unnecessary semicolon (;) from Scala code.	Henry Saputra	2013-11-19	1	-1/+3
\| \| \| \|	Passed the sbt/sbt compile and test
*	Remove the semicolons at the end of Scala code to make it more pure Scala code.	Henry Saputra	2013-11-19	5	-10/+9
\| \| \| \| \| \| \|	Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.
*	Made block generator thread safe to fix Kafka bug.	Tathagata Das	2013-11-12	2	-7/+80
\|
*	Merge branch 'apache-master' into transform	Tathagata Das	2013-10-25	9	-15/+180
\|\
\| *	Exclude jopt from kafka dependency.	Patrick Wendell	2013-10-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
\| *	Style fixes	Patrick Wendell	2013-10-24	1	-9/+9
\| \|
\| *	Spacing fix	Patrick Wendell	2013-10-24	1	-4/+4
\| \|
\| *	Small spacing fix	Patrick Wendell	2013-10-24	1	-2/+2
\| \|
\| *	Adding Java versions and associated tests	Patrick Wendell	2013-10-24	4	-0/+68
\| \|
\| *	Some clean-up of tests	Patrick Wendell	2013-10-24	3	-7/+10
\| \|
\| *	Removing Java for now	Patrick Wendell	2013-10-24	1	-7/+0
\| \|
\| *	Adding tests	Patrick Wendell	2013-10-24	2	-5/+88
\| \|
\| *	Add a `repartition` operator.	Patrick Wendell	2013-10-24	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
* \|	Fixed accidental bug.	Tathagata Das	2013-10-24	1	-1/+1
\| \|
* \|	Merge branch 'apache-master' into transform	Tathagata Das	2013-10-24	3	-0/+129
\|\\|
\| *	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	3	-0/+129
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| \| *	Update MQTTInputDStream.scala	Prabeesh K	2013-10-18	1	-4/+11
\| \| \|
\| \| *	modify code, use Spark Logging Class	prabeesh	2013-10-17	1	-35/+26
\| \| \|
\| \| *	add maven dependencies for mqtt	prabeesh	2013-10-16	1	-0/+5
\| \| \|
\| \| *	added mqtt adapter	prabeesh	2013-10-16	1	-0/+15
\| \| \|
\| \| *	mqttinputdstream for mqttstreaming adapter	prabeesh	2013-10-16	1	-0/+111
\| \| \|
* \| \|	Added JavaStreamingContext.transform	Tathagata Das	2013-10-24	4	-33/+158
\| \| \|
* \| \|	Merge branch 'apache-master' into transform	Tathagata Das	2013-10-22	15	-88/+125
\|\\| \|
\| * \|	Merge pull request #56 from jerryshao/kafka-0.8-dev	Matei Zaharia	2013-10-21	15	-94/+109
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml
\| \| * \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming	jerryshao	2013-10-12	15	-88/+109
\| \| \| \|
\| * \| \|	Exclusion rules for Maven build files.	Reynold Xin	2013-10-19	1	-0/+22
\| \| \| \|
* \| \| \|	Fixed bug in Java transformWith, added more Java testcases for transform and ↵	Tathagata Das	2013-10-22	6	-176/+421
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
* \| \| \|	Updated TransformDStream to allow n-ary DStream transform. Added ↵	Tathagata Das	2013-10-21	9	-33/+457
\|/ / / \| \| \| \| \| \| \| \| \|	transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
* \| \|	Merge pull request #8 from vchekan/checkpoint-ttl-restore	Matei Zaharia	2013-10-15	2	-0/+6
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Serialize and restore spark.cleaner.ttl to savepoint In accordance to conversation in spark-dev maillist, preserve spark.cleaner.ttl parameter when serializing checkpoint.
\| * \| \|	Serialize and restore spark.cleaner.ttl to savepoint	Vadim Chekan	2013-09-20	2	-0/+6
\| \| \|/ \| \|/\|
* \| \|	Refactor BlockId into an actual type	Aaron Davidson	2013-10-12	4	-16/+17
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an unfortunately invasive change which converts all of our BlockId strings into actual BlockId types. Here are some advantages of doing this now: + Type safety + Code clarity - it's now obvious what the key of a shuffle or rdd block is, for instance. Additionally, appearing in tuple/map type signatures is a big readability bonus. A Seq[(String, BlockStatus)] is not very clear. Further, we can now use more Scala features, like matching on BlockId types. + Explicit usage - we can now formally tell where various BlockIds are being used (without doing string searches); this makes updating current BlockIds a much clearer process, and compiler-supported. (I'm looking at you, shuffle file consolidation.) + It will only get harder to make this change as time goes on. Since this touches a lot of files, it'd be best to either get this patch in quickly or throw it on the ground to avoid too many secondary merge conflicts.
* \|	Merging build changes in from 0.8	Patrick Wendell	2013-10-05	1	-4/+5
\| \|
* \|	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\|/
*	Move some classes to more appropriate packages:	Matei Zaharia	2013-09-01	31	-60/+70
\| \| \| \| \| \|	* RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
*	Fix some URLs	Matei Zaharia	2013-09-01	1	-1/+1
\|
*	Initial work to rename package to org.apache.spark	Matei Zaharia	2013-09-01	59	-261/+266
\|
*	Merge pull request #701 from ScrapCodes/documentation-suggestions	Matei Zaharia	2013-08-22	3	-3/+7
\|\ \| \| \| \|	Documentation suggestions for spark streaming.
\| *	Linking custom receiver guide	Prashant Sharma	2013-08-23	2	-0/+4
\| \|
\| *	Corrections in documentation comment	Prashant Sharma	2013-08-23	1	-3/+3
\| \|
* \|	Remove redundant dependencies from POMs	Jey Kottalam	2013-08-18	1	-5/+0
\| \|
* \|	Maven build now also works with YARN	Jey Kottalam	2013-08-16	1	-40/+0
\| \|
* \|	Maven build now works with CDH hadoop-2.0.0-mr1	Jey Kottalam	2013-08-16	1	-27/+0
\| \|
* \|	Initial changes to make Maven build agnostic of hadoop version	Jey Kottalam	2013-08-16	1	-32/+10
\| \|
* \|	Change scala.Option to Guava Optional in Java APIs.	Josh Rosen	2013-08-11	1	-5/+2
\| \|
* \|	Changed other LZF uses to use the compression codec interface.	Reynold Xin	2013-07-31	1	-9/+17
\| \|
* \|	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	60	-6/+1026
\|/
*	Merge pull request #688 from markhamstra/scalaDependencies	Matei Zaharia	2013-07-08	1	-0/+4
\|\ \| \| \| \|	Fixed SPARK-795 with explicit dependencies
\| *	pom cleanup	Mark Hamstra	2013-07-08	1	-1/+0
\| \|
\| *	Explicit dependencies for scala-library and scalap to prevent 2.9.2 vs. ↵	Mark Hamstra	2013-07-08	1	-0/+5
\| \| \| \| \| \| \| \|	2.9.3 problems