aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Some clean-up of testsPatrick Wendell2013-10-243-7/+10
|
* Removing Java for nowPatrick Wendell2013-10-241-7/+0
|
* Adding testsPatrick Wendell2013-10-243-18/+88
|
* Always use a shufflePatrick Wendell2013-10-241-15/+7
|
* Add a `repartition` operator.Patrick Wendell2013-10-245-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
* Merge pull request #93 from kayousterhout/ui_new_stateMatei Zaharia2013-10-2310-8/+130
|\ | | | | | | | | | | | | | | | | Show "GETTING_RESULTS" state in UI. This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
| * Clear akka frame size property in testsKay Ousterhout2013-10-231-2/+6
| |
| * Fixed broken testsKay Ousterhout2013-10-231-4/+3
| |
| * Merge remote-tracking branch 'upstream/master' into ui_new_stateKay Ousterhout2013-10-2370-810/+1426
| |\ | | | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
| * | Shorten GETTING_RESULT to GET_RESULTKay Ousterhout2013-10-221-1/+1
| | |
| * | Show "GETTING_RESULTS" state in UI.Kay Ousterhout2013-10-2110-5/+124
| | | | | | | | | | | | | | | | | | | | | This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
* | | Merge pull request #105 from pwendell/doc-fixReynold Xin2013-10-230-0/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | Fixing broken links in programming guide Unfortunately these are broken in 0.8.0.
| * | | Fixing broken links in programming guidePatrick Wendell2013-10-231-3/+3
| | | |
* | | | Merge pull request #103 from JoshRosen/unpersist-fixReynold Xin2013-10-233-0/+34
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add unpersist() to JavaDoubleRDD and JavaPairRDD. This fixes a minor inconsistency where [unpersist() was only available on JavaRDD](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3CCE8D8748.68C0%25YannLuppo%40livenation.com%3E) and not JavaPairRDD / JavaDoubleRDD. I also added support for the new optional `blocking` argument added in 0.8. Please merge this into branch-0.8, too.
| * | | | Add unpersist() to JavaDoubleRDD and JavaPairRDD.Josh Rosen2013-10-233-0/+34
| | | | | | | | | | | | | | | | | | | | Also add support for new optional `blocking` argument.
* | | | | Fix Maven build to use MQTT repositoryMatei Zaharia2013-10-232-3/+14
| |_|_|/ |/| | |
* | | | Merge pull request #64 from prabeesh/masterMatei Zaharia2013-10-235-1/+241
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
| * | | Update MQTTWordCount.scalaPrabeesh K2013-10-221-6/+1
| | | |
| * | | Update MQTTWordCount.scalaPrabeesh K2013-10-221-3/+4
| | | |
| * | | Update MQTTWordCount.scalaPrabeesh K2013-10-181-15/+14
| | | |
| * | | Update MQTTInputDStream.scalaPrabeesh K2013-10-181-4/+11
| | | |
| * | | modify code, use Spark Logging Classprabeesh2013-10-171-35/+26
| | | |
| * | | remove unused dependencyprabeesh2013-10-171-5/+0
| | | |
| * | | remove unused dependencyprabeesh2013-10-171-2/+0
| | | |
| * | | add maven dependencies for mqttprabeesh2013-10-161-0/+5
| | | |
| * | | add maven dependencies for mqttprabeesh2013-10-161-0/+5
| | | |
| * | | added mqtt adapter wordcount exampleprabeesh2013-10-161-0/+112
| | | |
| * | | added mqtt adapter library dependenciesprabeesh2013-10-161-1/+7
| | | |
| * | | added mqtt adapterprabeesh2013-10-161-0/+15
| | | |
| * | | mqttinputdstream for mqttstreaming adapterprabeesh2013-10-161-0/+111
| | | |
* | | | Merge pull request #97 from ewencp/pyspark-system-propertiesMatei Zaharia2013-10-222-12/+49
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add classmethod to SparkContext to set system properties. Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
| * | | | Add notes to python documentation about using SparkContext.setSystemProperty.Ewen Cheslack-Postava2013-10-221-0/+11
| | | | |
| * | | | Pass self to SparkContext._ensure_initialized.Ewen Cheslack-Postava2013-10-221-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The constructor for SparkContext should pass in self so that we track the current context and produce errors if another one is created. Add a doctest to make sure creating multiple contexts triggers the exception.
| * | | | Add classmethod to SparkContext to set system properties.Ewen Cheslack-Postava2013-10-221-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
* | | | | Merge pull request #100 from JoshRosen/spark-902Reynold Xin2013-10-228-32/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove redundant Java Function call() definitions This should fix [SPARK-902](https://spark-project.atlassian.net/browse/SPARK-902), an issue where some Java API Function classes could cause AbstractMethodErrors when user code is compiled using the Eclipse compiler. Thanks to @MartinWeindel for diagnosing this problem. (This PR subsumes #30).
| * | | | | Remove redundant Java Function call() definitionsJosh Rosen2013-10-228-32/+4
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should fix SPARK-902, an issue where some Java API Function classes could cause AbstractMethodErrors when user code is compiled using the Eclipse compiler. Thanks to @MartinWeindel for diagnosing this problem. (This PR subsumes / closes #30)
* | | | | Merge pull request #99 from pwendell/masterPatrick Wendell2013-10-221-3/+5
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | Use correct formatting for comments in StoragePerfTester
| * | | | | Formatting cleanupPatrick Wendell2013-10-221-3/+5
|/ / / / /
* | | | | Merge pull request #90 from pwendell/masterPatrick Wendell2013-10-2212-117/+165
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-940: Do not directly pass Stage objects to SparkListener. This patch updates the SparkListener interface to pass StageInfo objects rather than directly pass spark Stages. The reason for this patch is explained in detail in SPARK-940.
| * | | | | Minor clean-up in reviewPatrick Wendell2013-10-223-8/+3
| | | | | |
| * | | | | Response to code review and adding some more testsPatrick Wendell2013-10-2210-74/+107
| | | | | |
| * | | | | Fix for Spark-870.Patrick Wendell2013-10-227-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug where the Spark UI didn't display the correct number of total tasks if the number of tasks in a Stage doesn't equal the number of RDD partitions. It also cleans up the listener API a bit by embedding this information in the StageInfo class rather than passing it seperately.
| * | | | | SPARK-940: Do not directly pass Stage objects to SparkListener.Patrick Wendell2013-10-229-59/+75
|/ / / / /
* | | | | Merge pull request #98 from aarondav/docsMatei Zaharia2013-10-221-3/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | Docs: Fix links to RDD API documentation
| * | | | | Docs: Fix links to RDD API documentationAaron Davidson2013-10-221-3/+3
| | | | | |
* | | | | | Merge pull request #82 from JoshRosen/map-output-tracker-refactoringMatei Zaharia2013-10-225-98/+109
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split MapOutputTracker into Master/Worker classes Previously, MapOutputTracker contained fields and methods that were only applicable to the master or worker instances. This commit introduces a MasterMapOutputTracker class to prevent the master-specific methods from being accessed on workers. I also renamed a few methods and made others protected/private.
| * | | | | | Unwrap a long line that actually fits.Josh Rosen2013-10-201-2/+1
| | | | | | |
| * | | | | | Fix test failures in local mode due to updateEpochJosh Rosen2013-10-201-4/+1
| | | | | | |
| * | | | | | Split MapOutputTracker into Master/Worker classes.Josh Rosen2013-10-195-98/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, MapOutputTracker contained fields and methods that were only applicable to the master or worker instances. This commit introduces a MasterMapOutputTracker class to prevent the master-specific methods from being accessed on workers. I also renamed a few methods and made others protected/private.
* | | | | | | Merge pull request #92 from tgravescs/sparkYarnFixClasspathMatei Zaharia2013-10-212-10/+29
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the Worker to use CoarseGrainedExecutorBackend and modify classpath ... ...to be explicit about inclusion of spark.jar and app.jar. Be explicit so if there are any conflicts in packaging between spark.jar and app.jar we don't get random results due to the classpath having /*, which can including things in different order.