spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	A little revise for the document	soulmachine	2013-10-26	1	-3/+4
\|
*	Merge pull request #108 from alig/master	Matei Zaharia	2013-10-25	7	-8/+123
\|\ \| \| \| \| \| \|	Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.
\| *	fixing comments on PR	Ali Ghodsi	2013-10-25	3	-29/+18
\| \|
\| *	Makes Spark SIMR ready.	Ali Ghodsi	2013-10-24	7	-5/+131
\| \|
* \|	Merge pull request #102 from tdas/transform	Matei Zaharia	2013-10-25	13	-162/+1037
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added new Spark Streaming operations New operations - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API - StreamingContext.transform to allow arbitrary n-to-1 DStream - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API - missing variations of join and cogroup added to Scala Java API - missing JavaStreamingContext.union Updated a number of Java and Scala API docs
\| * \	Merge branch 'apache-master' into transform	Tathagata Das	2013-10-25	17	-15/+272
\| \|\ \
\| * \| \|	Fixed accidental bug.	Tathagata Das	2013-10-24	1	-1/+1
\| \| \| \|
\| * \| \|	Merge branch 'apache-master' into transform	Tathagata Das	2013-10-24	19	-10/+417
\| \|\ \ \
\| * \| \| \|	Added JavaStreamingContext.transform	Tathagata Das	2013-10-24	5	-33/+169
\| \| \| \| \|
\| * \| \| \|	Removed Function3.call() based on Josh's comment.	Tathagata Das	2013-10-23	1	-2/+0
\| \| \| \| \|
\| * \| \| \|	Merge branch 'apache-master' into transform	Tathagata Das	2013-10-22	90	-3300/+2058
\| \|\ \ \ \
\| * \| \| \| \|	Fixed bug in Java transformWith, added more Java testcases for transform and ↵	Tathagata Das	2013-10-22	8	-179/+424
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
\| * \| \| \| \|	Updated TransformDStream to allow n-ary DStream transform. Added ↵	Tathagata Das	2013-10-21	11	-33/+529
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
* \| \| \| \| \|	Merge pull request #111 from kayousterhout/ui_name	Matei Zaharia	2013-10-25	2	-3/+1
\|\ \ \ \ \ \ \| \|_\|_\|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name. @pwendell -- let me know if this change was intentional
\| * \| \| \| \|	Properly display the name of a stage in the UI.	Kay Ousterhout	2013-10-25	2	-3/+1
\| \| \|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name.
* \| \| \| \|	Merge pull request #110 from pwendell/master	Reynold Xin	2013-10-25	2	-0/+5
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
\| * \| \| \| \|	Exclude jopt from kafka dependency.	Patrick Wendell	2013-10-25	2	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
* \| \| \| \| \|	Merge pull request #109 from pwendell/master	Reynold Xin	2013-10-24	9	-6/+128
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding Java/Java Streaming versions of `repartition` with associated tests
\| * \| \| \| \|	Style fixes	Patrick Wendell	2013-10-24	1	-9/+9
\| \| \| \| \| \|
\| * \| \| \| \|	Spacing fix	Patrick Wendell	2013-10-24	2	-6/+6
\| \| \| \| \| \|
\| * \| \| \| \|	Small spacing fix	Patrick Wendell	2013-10-24	1	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Adding Java versions and associated tests	Patrick Wendell	2013-10-24	9	-1/+123
\| \| \| \| \| \|
* \| \| \| \| \|	Merge pull request #106 from pwendell/master	Reynold Xin	2013-10-24	8	-10/+140
\|\\| \| \| \| \| \| \|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
\| * \| \| \|	Some clean-up of tests	Patrick Wendell	2013-10-24	3	-7/+10
\| \| \| \| \|
\| * \| \| \|	Removing Java for now	Patrick Wendell	2013-10-24	1	-7/+0
\| \| \| \| \|
\| * \| \| \|	Adding tests	Patrick Wendell	2013-10-24	3	-18/+88
\| \| \| \| \|
\| * \| \| \|	Always use a shuffle	Patrick Wendell	2013-10-24	1	-15/+7
\| \| \| \| \|
\| * \| \| \|	Add a `repartition` operator.	Patrick Wendell	2013-10-24	5	-0/+72
\|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
* \| \| \|	Merge pull request #93 from kayousterhout/ui_new_state	Matei Zaharia	2013-10-23	10	-8/+130
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Show "GETTING_RESULTS" state in UI. This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
\| * \| \| \|	Clear akka frame size property in tests	Kay Ousterhout	2013-10-23	1	-2/+6
\| \| \| \| \|
\| * \| \| \|	Fixed broken tests	Kay Ousterhout	2013-10-23	1	-4/+3
\| \| \| \| \|
\| * \| \| \|	Merge remote-tracking branch 'upstream/master' into ui_new_state	Kay Ousterhout	2013-10-23	70	-810/+1426
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
\| * \| \| \| \|	Shorten GETTING_RESULT to GET_RESULT	Kay Ousterhout	2013-10-22	1	-1/+1
\| \| \| \| \| \|
\| * \| \| \| \|	Show "GETTING_RESULTS" state in UI.	Kay Ousterhout	2013-10-21	10	-5/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
* \| \| \| \| \|	Merge pull request #105 from pwendell/doc-fix	Reynold Xin	2013-10-23	0	-0/+0
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixing broken links in programming guide Unfortunately these are broken in 0.8.0.
\| * \| \| \| \| \|	Fixing broken links in programming guide	Patrick Wendell	2013-10-23	1	-3/+3
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #103 from JoshRosen/unpersist-fix	Reynold Xin	2013-10-23	3	-0/+34
\|\ \ \ \ \ \ \ \| \|_\|_\|_\|_\|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add unpersist() to JavaDoubleRDD and JavaPairRDD. This fixes a minor inconsistency where [unpersist() was only available on JavaRDD](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3CCE8D8748.68C0%25YannLuppo%40livenation.com%3E) and not JavaPairRDD / JavaDoubleRDD. I also added support for the new optional `blocking` argument added in 0.8. Please merge this into branch-0.8, too.
\| * \| \| \| \| \|	Add unpersist() to JavaDoubleRDD and JavaPairRDD.	Josh Rosen	2013-10-23	3	-0/+34
\| \| \|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also add support for new optional `blocking` argument.
* \| \| \| \| \|	Fix Maven build to use MQTT repository	Matei Zaharia	2013-10-23	2	-3/+14
\| \|_\|_\|/ / \|/\| \| \| \|
* \| \| \| \|	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	5	-1/+241
\|\ \ \ \ \ \| \|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| * \| \| \|	Update MQTTWordCount.scala	Prabeesh K	2013-10-22	1	-6/+1
\| \| \| \| \|
\| * \| \| \|	Update MQTTWordCount.scala	Prabeesh K	2013-10-22	1	-3/+4
\| \| \| \| \|
\| * \| \| \|	Update MQTTWordCount.scala	Prabeesh K	2013-10-18	1	-15/+14
\| \| \| \| \|
\| * \| \| \|	Update MQTTInputDStream.scala	Prabeesh K	2013-10-18	1	-4/+11
\| \| \| \| \|
\| * \| \| \|	modify code, use Spark Logging Class	prabeesh	2013-10-17	1	-35/+26
\| \| \| \| \|
\| * \| \| \|	remove unused dependency	prabeesh	2013-10-17	1	-5/+0
\| \| \| \| \|
\| * \| \| \|	remove unused dependency	prabeesh	2013-10-17	1	-2/+0
\| \| \| \| \|
\| * \| \| \|	add maven dependencies for mqtt	prabeesh	2013-10-16	1	-0/+5
\| \| \| \| \|
\| * \| \| \|	add maven dependencies for mqtt	prabeesh	2013-10-16	1	-0/+5
\| \| \| \| \|
\| * \| \| \|	added mqtt adapter wordcount example	prabeesh	2013-10-16	1	-0/+112
\| \| \| \| \|