aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #394 from tdas/error-handlingPatrick Wendell2014-01-1223-231/+591
|\ | | | | | | | | | | | | | | | | | | | | | | Better error handling in Spark Streaming and more API cleanup Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug. This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext. Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392. And changed a lot of protected[streaming] to private[streaming].
| * Merge remote-tracking branch 'apache/master' into error-handlingTathagata Das2014-01-124-6/+18
| |\
| * | Changed StreamingContext.stopForWait to awaitTermination.Tathagata Das2014-01-124-15/+15
| | |
| * | Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and ↵Tathagata Das2014-01-129-56/+96
| | | | | | | | | | | | NetworkInputTracker upon close.
| * | Fixed bugs.Tathagata Das2014-01-123-3/+3
| | |
| * | Merge remote-tracking branch 'apache/master' into error-handlingTathagata Das2014-01-1130-174/+1347
| |\ \
| * | | Added waitForStop and stop to JavaStreamingContext.Tathagata Das2014-01-112-3/+23
| | | |
| * | | Converted JobScheduler to use actors for event handling. Changed ↵Tathagata Das2014-01-1117-185/+485
| | | | | | | | | | | | | | | | protected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite.
* | | | Merge pull request #398 from pwendell/streaming-apiPatrick Wendell2014-01-1210-28/+65
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | Rename DStream.foreach to DStream.foreachRDD `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
| * | | Adding deprecated versions of old codePatrick Wendell2014-01-122-8/+45
| | | |
| * | | Rename DStream.foreach to DStream.foreachRDDPatrick Wendell2014-01-1210-22/+22
| | |/ | |/| | | | | | | | | | | | | `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
* | | Merge pull request #396 from pwendell/executor-envPatrick Wendell2014-01-121-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting load defaults to true in executor This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations. /cc @andrewor14 who initially reported this issue.
| * | | Setting load defaults to true in executorPatrick Wendell2014-01-121-1/+1
| |/ /
* | | Merge pull request #392 from rxin/listenerbusReynold Xin2014-01-123-5/+17
|\ \ \ | |/ / |/| | | | | | | | | | | Stop SparkListenerBus daemon thread when DAGScheduler is stopped. Otherwise this leads to hundreds of SparkListenerBus daemon threads in our unit tests (and also problematic if user applications launches multiple SparkContext).
| * | Stop SparkListenerBus daemon thread when DAGScheduler is stopped.Reynold Xin2014-01-113-5/+17
| | |
* | | Merge pull request #389 from rxin/clone-writablesReynold Xin2014-01-113-41/+71
|\ \ \ | | | | | | | | | | | | Minor update for clone writables and more documentation.
| * | | Renamed cloneKeyValues to cloneRecords; updated docs.Reynold Xin2014-01-113-44/+45
| | | |
| * | | Minor update for clone writables and more documentation.Reynold Xin2014-01-113-12/+41
| |/ /
* | | Merge pull request #388 from pwendell/masterReynold Xin2014-01-111-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Fix UI bug introduced in #244. The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.
| * | | Fix UI bug introduced in #244.Patrick Wendell2014-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.
* | | | Merge pull request #393 from pwendell/revert-381Patrick Wendell2014-01-112-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert PR 381 This PR missed a bunch of test cases that require "spark.cleaner.ttl". I think it is what is causing test failures on Jenkins right now (though it's a bit hard to tell because the DNS for cs.berkeley.edu is down). I'm submitting this to see if it fixes jeknins. I did try just patching various tests but it was taking a really long time because there are a bunch of them, so for now I'm just seeing if a revert works.
| * | | | Revert "Fix default TTL for metadata cleaner"Patrick Wendell2014-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | This reverts commit 669ba4caa95014f4511f842206c3e506f1a41a7a.
| * | | | Revert "Fix one unit test that was not setting spark.cleaner.ttl"Patrick Wendell2014-01-111-1/+1
|/ / / / | | | | | | | | | | | | This reverts commit 942c80b34c4642de3b0237761bc1aaeb8cbdd546.
* | | | Merge pull request #387 from jerryshao/conf-fixReynold Xin2014-01-111-7/+8
|\ \ \ \ | |_|/ / |/| | | | | | | Fix configure didn't work small problem in ALS
| * | | Fix configure didn't work small problem in ALSjerryshao2014-01-111-7/+8
| | | |
* | | | Merge pull request #359 from ScrapCodes/clone-writablesReynold Xin2014-01-114-49/+106
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We clone hadoop key and values by default and reuse objects if asked to. We try to clone for most common types of writables and we call WritableUtils.clone otherwise intention is to optimize, for example for NullWritable there is no need and for Long, int and String creating a new object with value set would be faster than doing copy on object hopefully. There is another way to do this PR where we ask for both key and values whether to clone them or not, but could not think of a use case for it except either of them is actually a NullWritable for which I have already worked around. So thought that would be unnecessary.
| * | | | Fixes corresponding to Reynolds feedback commentsPrashant Sharma2014-01-094-32/+43
| | | | |
| * | | | we clone hadoop key and values by default and reuse if specified.Prashant Sharma2014-01-084-41/+87
| | | | |
* | | | | Merge pull request #373 from jerryshao/kafka-upgradePatrick Wendell2014-01-112-11/+11
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | Upgrade Kafka dependecy to 0.8.0 release version
| * | | | | Upgrade Kafka dependecy to 0.8.0 release versionjerryshao2014-01-102-11/+11
| | | | | |
* | | | | | Merge pull request #376 from prabeesh/masterReynold Xin2014-01-101-1/+1
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change clientId to random clientId The client identifier should be unique across all clients connecting to the same server. A convenience method is provided to generate a random client id that should satisfy this criteria - generateClientId(). Returns a randomly generated client identifier based on the current user's login name and the system time. As the client identifier is used by the server to identify a client when it reconnects, the client must use the same identifier between connections if durable subscriptions are to be used.
| * | | | | | Change clientId to random clientId Prabeesh K2014-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | Returns a randomly generated client identifier based on the current user's login name and the system time.
* | | | | | | Merge pull request #386 from pwendell/typo-fixReynold Xin2014-01-101-1/+1
|\ \ \ \ \ \ \ | |_|_|_|/ / / |/| | | | | | | | | | | | | Small typo fix
| * | | | | | Small typo fixPatrick Wendell2014-01-101-1/+1
| | |_|_|/ / | |/| | | |
* | | | | | Merge pull request #381 from mateiz/default-ttlMatei Zaharia2014-01-102-2/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix default TTL for metadata cleaner It seems to have been set to 3500 in a previous commit for debugging, but it should be off by default.
| * | | | | | Fix one unit test that was not setting spark.cleaner.ttlMatei Zaharia2014-01-101-1/+1
| | | | | | |
| * | | | | | Fix default TTL for metadata cleanerMatei Zaharia2014-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems to have been set to 3500 in a previous commit for debugging, but it should be off by default
* | | | | | | Merge pull request #382 from RongGu/masterPatrick Wendell2014-01-101-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a type error in comment lines Fix a type error in comment lines
| * | | | | | | fix a type error in comment linesRongGu2014-01-111-1/+1
| |/ / / / / /
* | | | | | | Merge pull request #385 from shivaram/add-i2-instancesPatrick Wendell2014-01-101-2/+10
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add i2 instance types to Spark EC2. Using data from http://aws.amazon.com/amazon-linux-ami/instance-type-matrix/ and http://www.ec2instances.info/
| * | | | | | | Add i2 instance types to Spark EC2.Shivaram Venkataraman2014-01-101-2/+10
| |/ / / / / /
* | | | | | | Merge pull request #383 from tdas/driver-testPatrick Wendell2014-01-1015-205/+600
|\ \ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | API for automatic driver recovery for streaming programs and other bug fixes 1. Added Scala and Java API for automatically loading checkpoint if it exists in the provided checkpoint directory. Scala API: `StreamingContext.getOrCreate(<checkpoint dir>, <function to create new StreamingContext>)` returns a StreamingContext Java API: `JavaStreamingContext.getOrCreate(<checkpoint dir>, <factory obj of type JavaStreamingContextFactory>)`, return a JavaStreamingContext See the RecoverableNetworkWordCount below as an example of how to use it. 2. Refactored streaming.Checkpoint*** code to fix bugs and make the DStream metadata checkpoint writing and reading more robust. Specifically, it fixes and improves the logic behind backing up and writing metadata checkpoint files. Also, it ensure that spark.driver.* and spark.hostPort is cleared from SparkConf before being written to checkpoint. 3. Fixed bug in cleaning up of checkpointed RDDs created by DStream. Specifically, this fix ensures that checkpointed RDD's files are not prematurely cleaned up, thus ensuring reliable recovery. 4. TimeStampedHashMap is upgraded to optionally update the timestamp on map.get(key). This allows clearing of data based on access time (i.e., clear records were last accessed before a threshold timestamp). 5. Added caching for file modification time in FileInputDStream using the updated TimeStampedHashMap. Without the caching, enumerating the mod times to find new files can take seconds if there are 1000s of files. This cache is automatically cleared. This PR is not entirely final as I may make some minor additions - a Java examples, and adding StreamingContext.getOrCreate to unit test. Edit: Java example to be added later, unit test added.
| * | | | | | Merge remote-tracking branch 'apache/master' into driver-testTathagata Das2014-01-1037-105/+205
| |\ \ \ \ \ \ | | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | Conflicts: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala
| * | | | | | Modified streaming.FailureSuite tests to test StreamingContext.getOrCreate.Tathagata Das2014-01-101-21/+34
| | | | | | |
| * | | | | | Updated docs based on Patrick's comments in PR 383.Tathagata Das2014-01-105-25/+58
| | | | | | |
| * | | | | | Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-testTathagata Das2014-01-102-55/+96
| |\ \ \ \ \ \
| | * | | | | | Removed spark.hostPort and other setting from SparkConf before saving to ↵Tathagata Das2014-01-101-18/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkpoint.
| | * | | | | | Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-testTathagata Das2014-01-1055-186/+282
| | |\ \ \ \ \ \
| | * | | | | | | Refactored graph checkpoint file reading and writing code to make it cleaner ↵Tathagata Das2014-01-102-49/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and easily debuggable.
| * | | | | | | | Fixed conf/slaves and updated docs.Tathagata Das2014-01-104-9/+23
| | |/ / / / / / | |/| | | | | |