| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Found while doing code review
Author: Jacek Laskowski <jacek@japila.pl>
Closes #10878 from jaceklaskowski/streaming-scaladoc-logs-tiny-fixes.
|
|
|
|
|
|
|
|
| |
Make StreamingContext.stop() exception-safe
Author: jayadevanmurali <jayadevan.m@tcs.com>
Closes #10807 from jayadevanmurali/branch-0.1-SPARK-11137.
|
|
|
|
|
|
|
|
|
|
| |
fit in Streaming page
Added CSS style to force names of input streams with receivers to wrap
Author: Alex Bozarth <ajbozart@us.ibm.com>
Closes #10873 from ajbozarth/spark12859.
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it.
- Update comments and docs
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10854 from zsxwing/remove-akka.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include the following changes:
1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10744 from zsxwing/streaming-akka-2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Streaming events to the same thread as Spark events
Including the following changes:
1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener
2. Remove StreamingListenerBus
3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus
4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10779 from zsxwing/streaming-listener.
|
|
|
|
|
|
|
|
| |
This patch refactors portions of the BlockManager and CacheManager in order to avoid having to pass `evictedBlocks` lists throughout the code. It appears that these lists were only consumed by `TaskContext.taskMetrics`, so the new code now directly updates the metrics from the lower-level BlockManager methods.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #10776 from JoshRosen/SPARK-10985.
|
|
|
|
|
|
|
|
|
|
|
|
| |
- [x] Upgrade Py4J to 0.9.1
- [x] SPARK-12657: Revert SPARK-12617
- [x] SPARK-12658: Revert SPARK-12511
- Still keep the change that only reading checkpoint once. This is a manual change and worth to take a look carefully. https://github.com/zsxwing/spark/commit/bfd4b5c040eb29394c3132af3c670b1a7272457c
- [x] Verify no leak any more after reverting our workarounds
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10692 from zsxwing/py4j-0.9.1.
|
|
|
|
|
|
|
|
|
|
|
| |
before "," or ":")
Fix the style violation (space before , and :).
This PR is a followup for #10643.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #10685 from sarutak/SPARK-12692-followup-streaming.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Turn import ordering violations into build errors, plus a few adjustments
to account for how the checker behaves. I'm a little on the fence about
whether the existing code is right, but it's easier to appease the checker
than to discuss what's the more correct order here.
Plus a few fixes to imports that cropped in since my recent cleanups.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10612 from vanzin/SPARK-3873-enable.
|
|
|
|
|
|
|
|
|
|
| |
Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)
See also https://github.com/apache/spark/pull/10512
Author: Sean Owen <sowen@cloudera.com>
Closes #10513 from srowen/SPARK-4819.
|
|
|
|
|
|
|
|
| |
Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs.
Author: Sean Owen <sowen@cloudera.com>
Closes #10570 from srowen/SPARK-12618.
|
|
|
|
|
|
|
|
| |
The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10609 from zsxwing/SPARK-12591.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR includes the following changes:
1. Rename `ActorReceiver` to `ActorReceiverSupervisor`
2. Remove `ActorHelper`
3. Add a new `ActorReceiver` for Scala and `JavaActorReceiver` for Java
4. Add `JavaActorWordCount` example
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10457 from zsxwing/java-actor-stream.
|
|
|
|
|
|
|
|
|
|
| |
…mprovements
Please review and merge at your convenience. Thanks!
Author: Jacek Laskowski <jacek@japila.pl>
Closes #10595 from jaceklaskowski/streaming-minor-fixes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code.
Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs.
For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
|
|
|
|
|
|
|
|
|
|
| |
Long not Java
Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change
Author: Sean Owen <sowen@cloudera.com>
Closes #10554 from srowen/SPARK-12604.
|
|
|
|
|
|
| |
default root path to gain the streaming batch url."
This reverts commit 19e4e9febf9bb4fd69f6d7bc13a54844e4e096f1. Will merge #10618 instead.
|
|
|
|
|
|
|
|
| |
path to gain the streaming batch url.
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes #10617 from SaintBacchus/SPARK-12672.
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10582 from vanzin/SPARK-3873-tests.
|
|
|
|
|
|
|
|
|
|
|
|
| |
PythonDStream.registerSerializer is called only once
There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. (https://github.com/bartdag/py4j/pull/184)
Py4j will create a PythonProxyHandler in Java for "transformer_serializer" when calling "registerSerializer". If we call "registerSerializer" twice, the second PythonProxyHandler will override the first one, then the first one will be GCed and trigger "PythonProxyHandler.finalize". To avoid that, we should not call"registerSerializer" more than once, so that "PythonProxyHandler" in Java side won't be GCed.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10514 from zsxwing/SPARK-12511.
|
|
|
|
|
|
|
|
|
|
| |
create a separate thread to wait for the job result
Before #9264, submitJob would create a separate thread to wait for the job result. `submitJobThreadPool` was a workaround in `ReceiverTracker` to run these waiting-job-result threads. Now #9264 has been merged to master and resolved this blocking issue, `submitJobThreadPool` can be removed now.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10560 from zsxwing/remove-submitJobThreadPool.
|
|
|
|
|
|
|
|
|
| |
Explicitly close client side socket connection before restart socket receiver.
Author: guoxu1231 <guoxu1231@gmail.com>
Author: Shawn Guo <guoxu1231@gmail.com>
Closes #10464 from guoxu1231/SPARK-12513.
|
|
|
|
|
|
|
|
|
|
| |
and reflection that supported 1.x
Remove use of deprecated Hadoop APIs now that 2.2+ is required
Author: Sean Owen <sowen@cloudera.com>
Closes #10446 from srowen/SPARK-12481.
|
|
|
|
|
|
|
|
| |
Also included a few miscelaneous other modules that had very few violations.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10532 from vanzin/SPARK-3873-streaming.
|
|
|
|
|
|
|
|
|
|
|
|
| |
suites after forcing to set specific value to "os.arch" property
Restore the original value of os.arch property after each test
Since some of tests forced to set the specific value to os.arch property, we need to set the original value.
Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Closes #10289 from kiszk/SPARK-12311.
|
|
|
|
|
|
| |
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10424 from zsxwing/typo.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10387 from rxin/version-bump.
|
|
|
|
|
|
|
|
|
|
| |
recovering from checkpoint data
Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works: check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed.
Author: jhu-chang <gt.hu.chang@gmail.com>
Closes #9765 from jhu-chang/SPARK-11749.
|
|
|
|
|
|
|
|
| |
String.split accepts a regular expression, so we should escape "." and "|".
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10361 from zsxwing/reg-bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
…endly Receiver graphs
Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.
This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.
This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.
Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)
After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)
Author: proflin <proflin.me@gmail.com>
Closes #10318 from proflin/SPARK-12304.
|
|
|
|
|
|
|
|
| |
cc\ tdas zsxwing , please review. Thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #10305 from jerryshao/fix-typo-state-impl.
|
|
|
|
|
|
|
|
|
|
| |
Currently the Streaming web UI does NOT list Receivers in order; however, it seems more convenient for the users if Receivers are listed in order.
![spark-12273](https://cloud.githubusercontent.com/assets/15843379/11736602/0bb7f7a8-a00b-11e5-8e86-96ba9297fb12.png)
Author: proflin <proflin.me@gmail.com>
Closes #10264 from proflin/Spark-12273.
|
|
|
|
|
|
|
|
| |
Adding ability to define an initial state RDD for use with updateStateByKey PySpark. Added unit test and changed stateful_network_wordcount example to use initial RDD.
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #10082 from BryanCutler/initial-rdd-updateStateByKey-SPARK-11713.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
suffix parameters
The original code does not properly handle the cases where the prefix is null, but suffix is not null - the suffix should be used but is not.
The fix is using StringBuilder to construct the proper file name.
Author: bomeng <bmeng@us.ibm.com>
Author: Bo Meng <mengbo@bos-macbook-pro.usca.ibm.com>
Closes #10185 from bomeng/SPARK-12136.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and change tracking function signature
SPARK-12244:
Based on feedback from early users and personal experience attempting to explain it, the name trackStateByKey had two problem.
"trackState" is a completely new term which really does not give any intuition on what the operation is
the resultant data stream of objects returned by the function is called in docs as the "emitted" data for the lack of a better.
"mapWithState" makes sense because the API is like a mapping function like (Key, Value) => T with State as an additional parameter. The resultant data stream is "mapped data". So both problems are solved.
SPARK-12245:
From initial experiences, not having the key in the function makes it hard to return mapped stuff, as the whole information of the records is not there. Basically the user is restricted to doing something like mapValue() instead of map(). So adding the key as a parameter.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #10224 from tdas/rename.
|
|
|
|
|
|
|
|
|
|
|
|
| |
present
The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004).
While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #9988 from tdas/SPARK-11932.
|
|
|
|
|
|
|
|
|
|
| |
Jenkins load is high
We need to make sure that the last entry is indeed the last entry in the queue.
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #10110 from brkyvz/batch-wal-test-fix.
|
|
|
|
|
|
|
|
|
|
| |
`ByteBuffer` doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by `ByteBuffer.slice`. We should not use the whole content of `ByteBuffer` unless we know that's correct.
This patch fixed all places that use `ByteBuffer.array` incorrectly.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10083 from zsxwing/bytebuffer-array.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This replaces https://github.com/apache/spark/pull/9696
Invoke Checkstyle and print any errors to the console, failing the step.
Use Google's style rules modified according to
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
Some important checks are disabled (see TODOs in `checkstyle.xml`) due to
multiple violations being present in the codebase.
Suggest fixing those TODOs in a separate PR(s).
More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/).
Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles):
> Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
> [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
> [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause.
> [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
> [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1
Also fix some of the minor violations that didn't require sweeping changes.
Apologies for the previous botched PRs - I finally figured out the issue.
cr: JoshRosen, pwendell
> I state that the contribution is my original work, and I license the work to the project under the project's open source license.
Author: Dmitry Erastov <derastov@gmail.com>
Closes #9867 from dskrvk/master.
|
|
|
|
|
|
|
|
| |
recovering StreamingContext from checkpoint
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #10127 from tdas/SPARK-12122.
|
|
|
|
|
|
|
|
| |
after test
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #10124 from tdas/InputStreamSuite-flaky-test.
|
|
|
|
|
|
|
|
|
|
|
|
| |
If `StreamingContext.stop()` is interrupted midway through the call, the context will be marked as stopped but certain state will have not been cleaned up. Because `state = STOPPED` will be set, subsequent `stop()` calls will be unable to finish stopping the context, preventing any new StreamingContexts from being created.
This patch addresses this issue by only marking the context as `STOPPED` once the `stop()` has successfully completed which allows `stop()` to be called a second time in order to finish stopping the context in case the original `stop()` call was interrupted.
I discovered this issue by examining logs from a failed Jenkins run in which this race condition occurred in `FailureSuite`, leaking an unstoppable context and causing all subsequent tests to fail.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #9982 from JoshRosen/SPARK-12001.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The JobConf object created in `DStream.saveAsHadoopFiles` is used concurrently in multiple places:
* The JobConf is updated by `RDD.saveAsHadoopFile()` before the job is launched
* The JobConf is serialized as part of the DStream checkpoints.
These concurrent accesses (updating in one thread, while the another thread is serializing it) can lead to concurrentModidicationException in the underlying Java hashmap using in the internal Hadoop Configuration object.
The solution is to create a new JobConf in every batch, that is updated by `RDD.saveAsHadoopFile()`, while the checkpointing serializes the original JobConf.
Tests to be added in #9988 will fail reliably without this patch. Keeping this patch really small to make sure that it can be added to previous branches.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #10088 from tdas/SPARK-12087.
|
|
|
|
|
|
|
|
| |
This PR backports PR #10039 to master
Author: Cheng Lian <lian@databricks.com>
Closes #10063 from liancheng/spark-12046.doc-fix.master.
|
|
|
|
|
|
|
|
|
|
| |
StreamingListenerSuite
In StreamingListenerSuite."don't call ssc.stop in listener", after the main thread calls `ssc.stop()`, `StreamingContextStoppingCollector` may call `ssc.stop()` in the listener bus thread, which is a dead-lock. This PR updated `StreamingContextStoppingCollector` to only call `ssc.stop()` in the first batch to avoid the dead-lock.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10011 from zsxwing/fix-test-deadlock.
|
|
|
|
|
|
|
|
|
|
|
|
| |
TransformFunctionSerializer to Java
The Python exception track in TransformFunction and TransformFunctionSerializer is not sent back to Java. Py4j just throws a very general exception, which is hard to debug.
This PRs adds `getFailure` method to get the failure message in Java side.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #9922 from zsxwing/SPARK-11935.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
recovered from checkpoint file
This solves the following exception caused when empty state RDD is checkpointed and recovered. The root cause is that an empty OpenHashMapBasedStateMap cannot be deserialized as the initialCapacity is set to zero.
```
Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 20, localhost): java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.<init>(StateMap.scala:96)
at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.<init>(StateMap.scala:86)
at org.apache.spark.streaming.util.OpenHashMapBasedStateMap.readObject(StateMap.scala:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:921)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:921)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
```
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #9958 from tdas/SPARK-11979.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`FileBasedWriteAheadLog.close()`
There is a race condition in `FileBasedWriteAheadLog.close()`, where if delete's of old log files are in progress, the write ahead log may close, and result in a `RejectedExecutionException`. This is okay, and should be handled gracefully.
Example test failures:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/95/testReport/junit/org.apache.spark.streaming.util/BatchedWriteAheadLogWithCloseFileAfterWriteSuite/BatchedWriteAheadLog___clean_old_logs/
The reason the test fails is in `afterEach`, `writeAheadLog.close` is called, and there may still be async deletes in flight.
tdas zsxwing
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #9953 from brkyvz/flaky-ss.
|
|
|
|
|
|
|
|
|
|
| |
correctly checkpointed
To make sure that all lineage is correctly truncated for TrackStateRDD when checkpointed.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #9831 from tdas/SPARK-11845.
|