spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Some clean-up of tests	Patrick Wendell	2013-10-24	3	-7/+10
\|
*	Removing Java for now	Patrick Wendell	2013-10-24	1	-7/+0
\|
*	Adding tests	Patrick Wendell	2013-10-24	3	-18/+88
\|
*	Always use a shuffle	Patrick Wendell	2013-10-24	1	-15/+7
\|
*	Add a `repartition` operator.	Patrick Wendell	2013-10-24	5	-0/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
*	Merge pull request #93 from kayousterhout/ui_new_state	Matei Zaharia	2013-10-23	10	-8/+130
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Show "GETTING_RESULTS" state in UI. This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
\| *	Clear akka frame size property in tests	Kay Ousterhout	2013-10-23	1	-2/+6
\| \|
\| *	Fixed broken tests	Kay Ousterhout	2013-10-23	1	-4/+3
\| \|
\| *	Merge remote-tracking branch 'upstream/master' into ui_new_state	Kay Ousterhout	2013-10-23	70	-810/+1426
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
\| * \|	Shorten GETTING_RESULT to GET_RESULT	Kay Ousterhout	2013-10-22	1	-1/+1
\| \| \|
\| * \|	Show "GETTING_RESULTS" state in UI.	Kay Ousterhout	2013-10-21	10	-5/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
* \| \|	Merge pull request #105 from pwendell/doc-fix	Reynold Xin	2013-10-23	0	-0/+0
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixing broken links in programming guide Unfortunately these are broken in 0.8.0.
\| * \| \|	Fixing broken links in programming guide	Patrick Wendell	2013-10-23	1	-3/+3
\| \| \| \|
* \| \| \|	Merge pull request #103 from JoshRosen/unpersist-fix	Reynold Xin	2013-10-23	3	-0/+34
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add unpersist() to JavaDoubleRDD and JavaPairRDD. This fixes a minor inconsistency where [unpersist() was only available on JavaRDD](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3CCE8D8748.68C0%25YannLuppo%40livenation.com%3E) and not JavaPairRDD / JavaDoubleRDD. I also added support for the new optional `blocking` argument added in 0.8. Please merge this into branch-0.8, too.
\| * \| \| \|	Add unpersist() to JavaDoubleRDD and JavaPairRDD.	Josh Rosen	2013-10-23	3	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also add support for new optional `blocking` argument.
* \| \| \| \|	Fix Maven build to use MQTT repository	Matei Zaharia	2013-10-23	2	-3/+14
\| \|_\|_\|/ \|/\| \| \|
* \| \| \|	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	5	-1/+241
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| * \| \|	Update MQTTWordCount.scala	Prabeesh K	2013-10-22	1	-6/+1
\| \| \| \|
\| * \| \|	Update MQTTWordCount.scala	Prabeesh K	2013-10-22	1	-3/+4
\| \| \| \|
\| * \| \|	Update MQTTWordCount.scala	Prabeesh K	2013-10-18	1	-15/+14
\| \| \| \|
\| * \| \|	Update MQTTInputDStream.scala	Prabeesh K	2013-10-18	1	-4/+11
\| \| \| \|
\| * \| \|	modify code, use Spark Logging Class	prabeesh	2013-10-17	1	-35/+26
\| \| \| \|
\| * \| \|	remove unused dependency	prabeesh	2013-10-17	1	-5/+0
\| \| \| \|
\| * \| \|	remove unused dependency	prabeesh	2013-10-17	1	-2/+0
\| \| \| \|
\| * \| \|	add maven dependencies for mqtt	prabeesh	2013-10-16	1	-0/+5
\| \| \| \|
\| * \| \|	add maven dependencies for mqtt	prabeesh	2013-10-16	1	-0/+5
\| \| \| \|
\| * \| \|	added mqtt adapter wordcount example	prabeesh	2013-10-16	1	-0/+112
\| \| \| \|
\| * \| \|	added mqtt adapter library dependencies	prabeesh	2013-10-16	1	-1/+7
\| \| \| \|
\| * \| \|	added mqtt adapter	prabeesh	2013-10-16	1	-0/+15
\| \| \| \|
\| * \| \|	mqttinputdstream for mqttstreaming adapter	prabeesh	2013-10-16	1	-0/+111
\| \| \| \|
* \| \| \|	Merge pull request #97 from ewencp/pyspark-system-properties	Matei Zaharia	2013-10-22	2	-12/+49
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add classmethod to SparkContext to set system properties. Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
\| * \| \| \|	Add notes to python documentation about using SparkContext.setSystemProperty.	Ewen Cheslack-Postava	2013-10-22	1	-0/+11
\| \| \| \| \|
\| * \| \| \|	Pass self to SparkContext._ensure_initialized.	Ewen Cheslack-Postava	2013-10-22	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The constructor for SparkContext should pass in self so that we track the current context and produce errors if another one is created. Add a doctest to make sure creating multiple contexts triggers the exception.
\| * \| \| \|	Add classmethod to SparkContext to set system properties.	Ewen Cheslack-Postava	2013-10-22	1	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
* \| \| \| \|	Merge pull request #100 from JoshRosen/spark-902	Reynold Xin	2013-10-22	8	-32/+4
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove redundant Java Function call() definitions This should fix [SPARK-902](https://spark-project.atlassian.net/browse/SPARK-902), an issue where some Java API Function classes could cause AbstractMethodErrors when user code is compiled using the Eclipse compiler. Thanks to @MartinWeindel for diagnosing this problem. (This PR subsumes #30).
\| * \| \| \| \|	Remove redundant Java Function call() definitions	Josh Rosen	2013-10-22	8	-32/+4
\|/ / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should fix SPARK-902, an issue where some Java API Function classes could cause AbstractMethodErrors when user code is compiled using the Eclipse compiler. Thanks to @MartinWeindel for diagnosing this problem. (This PR subsumes / closes #30)
* \| \| \| \|	Merge pull request #99 from pwendell/master	Patrick Wendell	2013-10-22	1	-3/+5
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use correct formatting for comments in StoragePerfTester
\| * \| \| \| \|	Formatting cleanup	Patrick Wendell	2013-10-22	1	-3/+5
\|/ / / / /
* \| \| \| \|	Merge pull request #90 from pwendell/master	Patrick Wendell	2013-10-22	12	-117/+165
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-940: Do not directly pass Stage objects to SparkListener. This patch updates the SparkListener interface to pass StageInfo objects rather than directly pass spark Stages. The reason for this patch is explained in detail in SPARK-940.
\| * \| \| \| \|	Minor clean-up in review	Patrick Wendell	2013-10-22	3	-8/+3
\| \| \| \| \| \|
\| * \| \| \| \|	Response to code review and adding some more tests	Patrick Wendell	2013-10-22	10	-74/+107
\| \| \| \| \| \|
\| * \| \| \| \|	Fix for Spark-870.	Patrick Wendell	2013-10-22	7	-13/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a bug where the Spark UI didn't display the correct number of total tasks if the number of tasks in a Stage doesn't equal the number of RDD partitions. It also cleans up the listener API a bit by embedding this information in the StageInfo class rather than passing it seperately.
\| * \| \| \| \|	SPARK-940: Do not directly pass Stage objects to SparkListener.	Patrick Wendell	2013-10-22	9	-59/+75
\|/ / / / /
* \| \| \| \|	Merge pull request #98 from aarondav/docs	Matei Zaharia	2013-10-22	1	-3/+3
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Docs: Fix links to RDD API documentation
\| * \| \| \| \|	Docs: Fix links to RDD API documentation	Aaron Davidson	2013-10-22	1	-3/+3
\| \| \| \| \| \|
* \| \| \| \| \|	Merge pull request #82 from JoshRosen/map-output-tracker-refactoring	Matei Zaharia	2013-10-22	5	-98/+109
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split MapOutputTracker into Master/Worker classes Previously, MapOutputTracker contained fields and methods that were only applicable to the master or worker instances. This commit introduces a MasterMapOutputTracker class to prevent the master-specific methods from being accessed on workers. I also renamed a few methods and made others protected/private.
\| * \| \| \| \| \|	Unwrap a long line that actually fits.	Josh Rosen	2013-10-20	1	-2/+1
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix test failures in local mode due to updateEpoch	Josh Rosen	2013-10-20	1	-4/+1
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Split MapOutputTracker into Master/Worker classes.	Josh Rosen	2013-10-19	5	-98/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, MapOutputTracker contained fields and methods that were only applicable to the master or worker instances. This commit introduces a MasterMapOutputTracker class to prevent the master-specific methods from being accessed on workers. I also renamed a few methods and made others protected/private.
* \| \| \| \| \| \|	Merge pull request #92 from tgravescs/sparkYarnFixClasspath	Matei Zaharia	2013-10-21	2	-10/+29
\|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the Worker to use CoarseGrainedExecutorBackend and modify classpath ... ...to be explicit about inclusion of spark.jar and app.jar. Be explicit so if there are any conflicts in packaging between spark.jar and app.jar we don't get random results due to the classpath having /*, which can including things in different order.