| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
This reverts commit 72a4fdbe82203b962fe776d0edaed7f56898cb02.
|
|
|
|
| |
This reverts commit 685bdd2b7e584c84e7d39e40de2d5f30c5388cb5.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 3f9e073ff0bb18b6079fda419d4e9dbf594545b0.
|
|
|
|
| |
This reverts commit 6de888129fcfe6e592458a4217fc66140747b54f.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 7029301778895427216f2e0710c6e72a523c0897.
|
|
|
|
| |
This reverts commit db22a9e2cb51eae2f8a79648ce3c6bf4fecdd641.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 837deabebf0714e3f3aca135d77169cc825824f3.
|
| |
|
|
|
|
|
|
|
| |
This reverts commit f3e62ffa4ccea62911207b918ef1c23c1f50467f.
Conflicts:
pom.xml
|
|
|
|
| |
This reverts commit 5c0032a471d858fb010b1737ea14375f1af3ed88.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
about convert files to RDDS there are 3 loops with files sequence in spark source.
loops files sequence:
1.files.map(...)
2.files.zip(fileRDDs)
3.files-size.foreach
It's will very time consuming when lots of files.So I do the following correction:
3 loops with files sequence => only one loop
Author: surq <surq@asiainfo.com>
Closes #2811 from surq/SPARK-3954 and squashes the following commits:
321bbe8 [surq] updated the code style.The style from [for...yield]to [files.map(file=>{})]
88a2c20 [surq] Merge branch 'master' of https://github.com/apache/spark into SPARK-3954
178066f [surq] modify code's style. [Exceeds 100 columns]
626ef97 [surq] remove redundant import(ArrayBuffer)
739341f [surq] promote the speed of convert files to RDDS
(cherry picked from commit ce6ed2abd14de26b9ceaa415e9a42fbb1338f5fa)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
calling stop()
In Spark 1.0.0+, calling `stop()` on a StreamingContext that has not been started is a no-op which has no side-effects. This allows users to call `stop()` on a fresh StreamingContext followed by `start()`. I believe that this almost always indicates an error and is not behavior that we should support. Since we don't allow `start() stop() start()` then I don't think it makes sense to allow `stop() start()`.
The current behavior can lead to resource leaks when StreamingContext constructs its own SparkContext: if I call `stop(stopSparkContext=True)`, then I expect StreamingContext's underlying SparkContext to be stopped irrespective of whether the StreamingContext has been started. This is useful when writing unit test fixtures.
Prior discussions:
- https://github.com/apache/spark/pull/3053#discussion-diff-19710333R490
- https://github.com/apache/spark/pull/3121#issuecomment-61927353
Author: Josh Rosen <joshrosen@databricks.com>
Closes #3160 from JoshRosen/SPARK-4301 and squashes the following commits:
dbcc929 [Josh Rosen] Address more review comments
bdbe5da [Josh Rosen] Stop SparkContext after stopping scheduler, not before.
03e9c40 [Josh Rosen] Always stop SparkContext, even if stop(false) has already been called.
832a7f4 [Josh Rosen] Address review comment
5142517 [Josh Rosen] Add tests; improve Scaladoc.
813e471 [Josh Rosen] Revert workaround added in https://github.com/apache/spark/pull/3053/files#diff-e144dbee130ed84f9465853ddce65f8eR49
5558e70 [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet.
(cherry picked from commit 7b41b17f3296eea3282efbdceb6b28baf128287d)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If classes implementing Serializable or Externalizable interfaces throw
exceptions other than IOException or ClassNotFoundException from their
(de)serialization methods, then this results in an unhelpful
"IOException: unexpected exception type" rather than the actual exception that
produced the (de)serialization error.
This patch fixes this by adding a utility method that re-wraps any uncaught
exceptions in IOException (unless they are already instances of IOException).
Author: Josh Rosen <joshrosen@databricks.com>
Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits:
cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write|read][Object|External].
(cherry picked from commit 6c98c29ae0033556fd4424f41d1de005c509e511)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Conflicts:
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala
core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
streaming/src/main/scala/org/apache/spark/streaming/api/python/PythonDStream.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark Stages UI
This is a refactored version of the original PR https://github.com/apache/spark/pull/1723 my mubarak
Please take a look andrewor14, mubarak
Author: Mubarak Seyed <mubarak.seyed@gmail.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #2464 from tdas/streaming-callsite and squashes the following commits:
dc54c71 [Tathagata Das] Made changes based on PR comments.
390b45d [Tathagata Das] Fixed minor bugs.
904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite
7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site.
b9ed945 [Mubarak Seyed] Adding streaming utils
c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
ceb43da [Mubarak Seyed] Changing default regex function name
8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream
33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods
c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream
5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None)
a207eb7 [Mubarak Seyed] Fixing code review comments
ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream
2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853
1d90cc3 [Mubarak Seyed] Changes for SPARK-1853
5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
70f494f [Mubarak Seyed] Changes for SPARK-1853
1500deb [Mubarak Seyed] Changes in Spark Streaming UI
9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
d466d75 [Mubarak Seyed] Changes for spark streaming UI
(cherry picked from commit 729952a5efce755387c76cdf29280ee6f49fdb72)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Original PR: #2363
Author: Andrew Or <andrewor14@gmail.com>
Closes #2415 from andrewor14/disable-ui-for-tests-1.1 and squashes the following commits:
8d9df5a [Andrew Or] Oops, missed one.
509507d [Andrew Or] Backport #2363 (SPARK-3490) into branch-1.1
|
|
|
|
| |
This reverts commit 2ffc7980c6818eec05e32141c52e335bc71daed9.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently open many ephemeral ports during the tests, and as a result we occasionally can't bind to new ones. This has caused the `DriverSuite` and the `SparkSubmitSuite` to fail intermittently.
By disabling the `SparkUI` when it's not needed, we already cut down on the number of ports opened significantly, on the order of the number of `SparkContexts` ever created. We must keep it enabled for a few tests for the UI itself, however.
Author: Andrew Or <andrewor14@gmail.com>
Closes #2363 from andrewor14/disable-ui-for-tests and squashes the following commits:
332a7d5 [Andrew Or] No need to set spark.ui.port to 0 anymore
30c93a2 [Andrew Or] Simplify streaming UISuite
a431b84 [Andrew Or] Fix streaming test failures
8f5ae53 [Andrew Or] Fix no new line at the end
29c9b5b [Andrew Or] Disable SparkUI for tests
(cherry picked from commit 6324eb7b5b0ae005cb2e913e36b1508bd6f1b9b8)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
Conflicts:
pom.xml
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
|
| |
|
| |
|
|
|
|
| |
This reverts commit b2d0493b223c5f98a593bb6d7372706cc02bebad.
|
|
|
|
| |
This reverts commit 865e6f63f63f5e881a02d1a4e3b4c5d0e86fcd8e.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 2b2e02265f80e4c5172c1e498aa9ba2c6b91c6c9.
|
|
|
|
| |
This reverts commit 8b5f0dbd8d32a25a4e7ba3ebe1a4c3c6310aeb85.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 711aebb329ca28046396af1e34395a0df92b5327.
|
|
|
|
| |
This reverts commit a4a7a241441489a0d31365e18476ae2e1c34464d.
|
| |
|
| |
|
|
|
|
| |
This reverts commit f07183249b74dd857069028bf7d570b35f265585.
|
|
|
|
| |
This reverts commit f8f7a0c9dce764ece8acdc41d35bbf448dba7e92.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 58b0be6a29eab817d350729710345e9f39e4c506.
|
|
|
|
| |
This reverts commit 78e3c036eee7113b2ed144eec5061e070b479e56.
|
|
|
|
| |
This reverts commit 79e86ef3e1a3ee03a7e3b166a5c7dee11c6d60d7.
|
|
|
|
| |
This reverts commit a118ea5c59d653f5a3feda21455ba60bc722b3b1.
|
|
|
|
| |
This reverts commit 71ec0140f7e121bdba3d19e8219e91a5e9d1e320.
|