aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* A little revise for the documentsoulmachine2013-10-261-3/+4
|
* Merge pull request #108 from alig/masterMatei Zaharia2013-10-257-8/+123
|\ | | | | | | Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.
| * fixing comments on PRAli Ghodsi2013-10-253-29/+18
| |
| * Makes Spark SIMR ready.Ali Ghodsi2013-10-247-5/+131
| |
* | Merge pull request #102 from tdas/transformMatei Zaharia2013-10-2513-162/+1037
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added new Spark Streaming operations New operations - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API - StreamingContext.transform to allow arbitrary n-to-1 DStream - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API - missing variations of join and cogroup added to Scala Java API - missing JavaStreamingContext.union Updated a number of Java and Scala API docs
| * \ Merge branch 'apache-master' into transformTathagata Das2013-10-2517-15/+272
| |\ \
| * | | Fixed accidental bug.Tathagata Das2013-10-241-1/+1
| | | |
| * | | Merge branch 'apache-master' into transformTathagata Das2013-10-2419-10/+417
| |\ \ \
| * | | | Added JavaStreamingContext.transformTathagata Das2013-10-245-33/+169
| | | | |
| * | | | Removed Function3.call() based on Josh's comment.Tathagata Das2013-10-231-2/+0
| | | | |
| * | | | Merge branch 'apache-master' into transformTathagata Das2013-10-2290-3300/+2058
| |\ \ \ \
| * | | | | Fixed bug in Java transformWith, added more Java testcases for transform and ↵Tathagata Das2013-10-228-179/+424
| | | | | | | | | | | | | | | | | | | | | | | | transformWith, added missing variations of Java join and cogroup, updated various Scala and Java API docs.
| * | | | | Updated TransformDStream to allow n-ary DStream transform. Added ↵Tathagata Das2013-10-2111-33/+529
| | | | | | | | | | | | | | | | | | | | | | | | transformWith, leftOuterJoin and rightOuterJoin operations to DStream for Scala and Java APIs. Also added n-ary union and n-ary transform operations to StreamingContext for Scala and Java APIs.
* | | | | | Merge pull request #111 from kayousterhout/ui_nameMatei Zaharia2013-10-252-3/+1
|\ \ \ \ \ \ | |_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name. @pwendell -- let me know if this change was intentional
| * | | | | Properly display the name of a stage in the UI.Kay Ousterhout2013-10-252-3/+1
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name.
* | | | | Merge pull request #110 from pwendell/masterReynold Xin2013-10-252-0/+5
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exclude jopt from kafka dependency. Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
| * | | | | Exclude jopt from kafka dependency.Patrick Wendell2013-10-252-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
* | | | | | Merge pull request #109 from pwendell/masterReynold Xin2013-10-249-6/+128
|\| | | | | | | | | | | | | | | | | | | | | | | Adding Java/Java Streaming versions of `repartition` with associated tests
| * | | | | Style fixesPatrick Wendell2013-10-241-9/+9
| | | | | |
| * | | | | Spacing fixPatrick Wendell2013-10-242-6/+6
| | | | | |
| * | | | | Small spacing fixPatrick Wendell2013-10-241-2/+2
| | | | | |
| * | | | | Adding Java versions and associated testsPatrick Wendell2013-10-249-1/+123
| | | | | |
* | | | | | Merge pull request #106 from pwendell/masterReynold Xin2013-10-248-10/+140
|\| | | | | | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
| * | | | Some clean-up of testsPatrick Wendell2013-10-243-7/+10
| | | | |
| * | | | Removing Java for nowPatrick Wendell2013-10-241-7/+0
| | | | |
| * | | | Adding testsPatrick Wendell2013-10-243-18/+88
| | | | |
| * | | | Always use a shufflePatrick Wendell2013-10-241-15/+7
| | | | |
| * | | | Add a `repartition` operator.Patrick Wendell2013-10-245-0/+72
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
* | | | Merge pull request #93 from kayousterhout/ui_new_stateMatei Zaharia2013-10-2310-8/+130
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Show "GETTING_RESULTS" state in UI. This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
| * | | | Clear akka frame size property in testsKay Ousterhout2013-10-231-2/+6
| | | | |
| * | | | Fixed broken testsKay Ousterhout2013-10-231-4/+3
| | | | |
| * | | | Merge remote-tracking branch 'upstream/master' into ui_new_stateKay Ousterhout2013-10-2370-810/+1426
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala
| * | | | | Shorten GETTING_RESULT to GET_RESULTKay Ousterhout2013-10-221-1/+1
| | | | | |
| * | | | | Show "GETTING_RESULTS" state in UI.Kay Ousterhout2013-10-2110-5/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.
* | | | | | Merge pull request #105 from pwendell/doc-fixReynold Xin2013-10-230-0/+0
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixing broken links in programming guide Unfortunately these are broken in 0.8.0.
| * | | | | | Fixing broken links in programming guidePatrick Wendell2013-10-231-3/+3
| | | | | | |
* | | | | | | Merge pull request #103 from JoshRosen/unpersist-fixReynold Xin2013-10-233-0/+34
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add unpersist() to JavaDoubleRDD and JavaPairRDD. This fixes a minor inconsistency where [unpersist() was only available on JavaRDD](https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3CCE8D8748.68C0%25YannLuppo%40livenation.com%3E) and not JavaPairRDD / JavaDoubleRDD. I also added support for the new optional `blocking` argument added in 0.8. Please merge this into branch-0.8, too.
| * | | | | | Add unpersist() to JavaDoubleRDD and JavaPairRDD.Josh Rosen2013-10-233-0/+34
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | Also add support for new optional `blocking` argument.
* | | | | | Fix Maven build to use MQTT repositoryMatei Zaharia2013-10-232-3/+14
| |_|_|/ / |/| | | |
* | | | | Merge pull request #64 from prabeesh/masterMatei Zaharia2013-10-235-1/+241
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
| * | | | Update MQTTWordCount.scalaPrabeesh K2013-10-221-6/+1
| | | | |
| * | | | Update MQTTWordCount.scalaPrabeesh K2013-10-221-3/+4
| | | | |
| * | | | Update MQTTWordCount.scalaPrabeesh K2013-10-181-15/+14
| | | | |
| * | | | Update MQTTInputDStream.scalaPrabeesh K2013-10-181-4/+11
| | | | |
| * | | | modify code, use Spark Logging Classprabeesh2013-10-171-35/+26
| | | | |
| * | | | remove unused dependencyprabeesh2013-10-171-5/+0
| | | | |
| * | | | remove unused dependencyprabeesh2013-10-171-2/+0
| | | | |
| * | | | add maven dependencies for mqttprabeesh2013-10-161-0/+5
| | | | |
| * | | | add maven dependencies for mqttprabeesh2013-10-161-0/+5
| | | | |
| * | | | added mqtt adapter wordcount exampleprabeesh2013-10-161-0/+112
| | | | |