| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes #2848 from ryan-williams/fetch-file and squashes the following commits:
c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently
8e39c16 [Ryan Williams] code review feedback
788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes #3736 from ryan-williams/hist and squashes the following commits:
421d8ff [Ryan Williams] add another random typo fix
76d6a4c [Ryan Williams] remove hdfs example
a2d0f82 [Ryan Williams] code review feedback
9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SparkEnv in Executor.
This consolidates some code path and makes constructor arguments simpler for a few classes.
Author: Reynold Xin <rxin@databricks.com>
Closes #3738 from rxin/sparkEnvDepRefactor and squashes the following commits:
82e02cc [Reynold Xin] Fixed couple bugs.
217062a [Reynold Xin] Code review feedback.
bd00af7 [Reynold Xin] Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor.
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits:
cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
|
|
|
|
|
|
|
|
|
|
|
| |
The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #3532 from viirya/refactor_getCallSite and squashes the following commits:
62aa124 [Liang-Chi Hsieh] Fix style.
e741017 [Liang-Chi Hsieh] Refactor getCallSite.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object.
cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`.
Author: Andrew Or <andrew@databricks.com>
Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits:
187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc
59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc
347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient
|
|
|
|
|
|
|
|
|
|
|
| |
This is used in NioBlockTransferService here:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66
Author: Aaron Davidson <aaron@databricks.com>
Closes #3688 from aarondav/SPARK-4837 and squashes the following commits:
ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config
|
|
|
|
|
|
|
|
|
|
| |
aggregateByKey and foldByKey
Author: Ivan Vergiliev <ivan@leanplum.com>
Closes #3605 from IvanVergiliev/change-serializer and squashes the following commits:
a49b7cf [Ivan Vergiliev] Use serializer instead of closureSerializer in aggregate/foldByKey.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html
This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884
Author: Madhu Siddalingaiah <madhu@madhu.com>
Closes #3722 from msiddalingaiah/master and squashes the following commits:
79e679f [Madhu Siddalingaiah] [DOC]: improve documentation
51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
|
|
|
|
|
|
|
|
|
|
|
|
| |
work
Hi all - cleaned up the code to get rid of the unused parameter and added some discussion of the ThreadPoolExecutor parameters to explain why we can use a single threadCount instead of providing a min/max.
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Closes #3664 from ilganeli/SPARK-3607C and squashes the following commits:
3c05690 [Ilya Ganelin] Updated documentation and refactored code to extract shared variables
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`MetricsServlet` handler should be added to the web UI after initialized by `MetricsSystem`, otherwise servlet handler cannot be attached.
Author: Saisai Shao <saisai.shao@intel.com>
Author: Josh Rosen <joshrosen@databricks.com>
Author: jerryshao <saisai.shao@intel.com>
Closes #3444 from jerryshao/SPARK-4595 and squashes the following commits:
434d17e [Saisai Shao] Merge pull request #10 from JoshRosen/metrics-system-cleanup
87a2292 [Josh Rosen] Guard against misuse of MetricsSystem methods.
f779fe0 [jerryshao] Fix MetricsServlet not work issue
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
update doc for WholeCombineFileRecordReader
Author: Davies Liu <davies@databricks.com>
Author: Josh Rosen <joshrosen@databricks.com>
Closes #3301 from davies/fix_doc and squashes the following commits:
1d7422f [Davies Liu] Merge pull request #2 from JoshRosen/whole-text-file-cleanup
dc3d21a [Josh Rosen] More genericization in ConfigurableCombineFileRecordReader.
95d13eb [Davies Liu] address comment
bf800b9 [Davies Liu] update doc for WholeCombineFileRecordReader
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: meiyoula <1039320815@qq.com>
Closes #3635 from XuTingjun/master and squashes the following commits:
dd1c66d [meiyoula] when old is deleted, it will throw an exception where call it
2a55bc2 [meiyoula] Update DiskBlockManager.scala
1483a4a [meiyoula] Delete multiple retries to make dir
67f7902 [meiyoula] Try some times to make dir maybe more reasonable
1c51a0c [meiyoula] Update DiskBlockManager.scala
|
|
|
|
|
|
|
|
|
|
| |
Using driver and executor in the comments of ```MapOutputTracker``` is more clear.
Author: wangfei <wangfei1@huawei.com>
Closes #3700 from scwf/commentFix and squashes the following commits:
aa68524 [wangfei] master and worker should be driver and executor
|
|
|
|
|
|
|
|
|
|
| |
This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it.
Author: Sean Owen <sowen@cloudera.com>
Closes #3690 from srowen/SPARK-785 and squashes the following commits:
8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Ryan Williams <ryan.blake.williams@gmail.com>
Closes #3523 from ryan-williams/tweaks and squashes the following commits:
d2eddaa [Ryan Williams] code review feedback
ce27fc1 [Ryan Williams] CoGroupedRDD comment nit
c6cfad9 [Ryan Williams] remove unnecessary if statement
b74ea35 [Ryan Williams] comment fix
b0221f0 [Ryan Williams] fix a gendered pronoun
c71ffed [Ryan Williams] use names on a few boolean parameters
89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager
e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io
83e8358 [Ryan Williams] fix pom.xml typo
dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
is confusing
Hi all - I've renamed the methods referenced in this JIRA to clarify that they modify the provided arrays (find vs. deque).
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Closes #3665 from ilganeli/SPARK-1037B and squashes the following commits:
64c177c [Ilya Ganelin] Renamed deque to dequeue
f27d85e [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters
683482a [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters
|
|
|
|
|
|
|
|
|
|
| |
other places
Author: Zhang, Liye <liye.zhang@intel.com>
Closes #2793 from liyezhang556520/uniformHashMap and squashes the following commits:
5884735 [Zhang, Liye] [CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction.
This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly.
Author: Andrew Or <andrew@databricks.com>
Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits:
e520d6b [Andrew Or] Oops
3ebf8bd [Andrew Or] A few comments
f370a4e [Andrew Or] Fix tests
2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hi all - I've renamed the unhelpfully named variable and added a comment clarifying what's actually happening.
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Closes #3666 from ilganeli/SPARK-4569B and squashes the following commits:
1810394 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator
e2d2092 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator
d7cefec [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator
5b3f39c [Ilya Ganelin] [SPARK-4569] Rename in Aggregator
|
|
|
|
|
|
|
|
|
|
| |
Currently this doesn't do anything in other modes, so we might as well just disable it rather than having the user mistakenly rely on it.
Author: Andrew Or <andrew@databricks.com>
Closes #3615 from andrewor14/dynamic-allocation-yarn-only and squashes the following commits:
ce6487a [Andrew Or] Allow requesting / killing executors only in YARN mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current HistoryPage have links only to previous page or next page.
I suggest to add index to access history pages easily.
I implemented like following pics.
If there are many pages, current page +/- N pages, head page and last page are indexed.
![2014-11-10 16 13 25](https://cloud.githubusercontent.com/assets/4736016/4986246/9c7bbac4-6937-11e4-8695-8634d039d5b6.png)
![2014-11-10 16 03 21](https://cloud.githubusercontent.com/assets/4736016/4986210/3951bb74-6937-11e4-8b4e-9f90d266d736.png)
![2014-11-10 16 03 39](https://cloud.githubusercontent.com/assets/4736016/4986211/3b196ad8-6937-11e4-9f81-74bc0a6dad5b.png)
![2014-11-10 16 03 49](https://cloud.githubusercontent.com/assets/4736016/4986213/40686138-6937-11e4-86c0-41100f0404f6.png)
![2014-11-10 16 04 04](https://cloud.githubusercontent.com/assets/4736016/4986215/4326c9b4-6937-11e4-87ac-0f30c86ec6e3.png)
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #3194 from sarutak/history-page-indexing and squashes the following commits:
15d3d2d [Kousuke Saruta] Simplified code
c93932e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
1c2f605 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
76b05e3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
b2240f8 [Kousuke Saruta] Fixed style
ec7922e [Kousuke Saruta] Simplified code
755a004 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing
cfa242b [Kousuke Saruta] Added index to HistoryPage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker.
This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up.
Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>
Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits:
a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests
b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark.
537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task.
39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing.
Before:
![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png)
After:
![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png)
Author: Josh Rosen <joshrosen@databricks.com>
Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits:
bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header.
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #3426 from sryza/sandy-spark-4567 and squashes the following commits:
cb4b8d2 [Sandy Ryza] SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
been removed after synchronizing on BlockInfo instance.
After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed.
The three methods have the same logic to get the `info` lock:
```
info = blockInfo.get(id)
if (info != null) {
info.synchronized {
// do something
}
}
```
So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first.
The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed.
But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks.
This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread.
Author: hushan[胡珊] <hushan@xiaomi.com>
Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits:
edb989d [hushan[胡珊]] Refine code style and comments position
55fa4ba [hushan[胡珊]] refine code
e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit removes the GC time for each task from the set of
optional, additional metrics, and instead always shows it for
each task.
cc pwendell
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #3622 from kayousterhout/gc_time and squashes the following commits:
15ac242 [Kay Ousterhout] Make TaskDetailsClassNames private[spark]
e71d893 [Kay Ousterhout] [SPARK-4765] Make GC time always shown in UI.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined"
In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ?
Author: maji2014 <maji3@asiainfo.com>
Closes #3553 from maji2014/spark-4691 and squashes the following commits:
bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala
10d0cf0 [maji2014] change a elegant way
d8f52dc [maji2014] code optimization for judgement
|
|
|
|
|
|
|
|
|
|
|
| |
My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter.
Author: Sean Owen <sowen@cloudera.com>
Closes #3587 from srowen/SPARK-3926 and squashes the following commits:
8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE
7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test
|
|
|
|
|
|
|
|
|
|
| |
Simple omission on my part.
Author: Andrew Or <andrew@databricks.com>
Closes #3612 from andrewor14/dynamic-allocation-synchronization and squashes the following commits:
1f03b60 [Andrew Or] Synchronize kills
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tempFile is created in the same directory than targetFile, so that the
move from tempFile to targetFile is always atomic
Author: Christophe Préaud <christophe.preaud@kelkoo.com>
Closes #2855 from preaudc/master and squashes the following commits:
9ba89ca [Christophe Préaud] Ensure that files are fetched atomically
54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
c6a5590 [Christophe Préaud] Revert commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd
7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
8ea871f [Christophe Préaud] Ensure that files are fetched atomically
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
individual private methods
In BlockManagermasterActor, when handling message type UpdateBlockInfo, the message replies is in handled in individual private methods, should handle it in receive of Akka.
Author: Zhang, Liye <liye.zhang@intel.com>
Closes #2853 from liyezhang556520/akkaRecv and squashes the following commits:
9b06f0a [Zhang, Liye] remove the unreachable code
bf518cd [Zhang, Liye] change the indent
242166b [Zhang, Liye] modified accroding to the comments
d4b929b [Zhang, Liye] [SPARK-4005][CORE] handle message replies in receive instead of in the individual private methods
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I ran into multiple cases that SBT/Scala compiler was confused by the implicits in continuous compilation mode. Adding explicit return types fixes the problem.
Author: Reynold Xin <rxin@databricks.com>
Closes #3580 from rxin/rdd-implicit and squashes the following commits:
ee32fcd [Reynold Xin] Move object RDD to the end of the file.
b8562c9 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into rdd-implicit
d4e9f85 [Reynold Xin] Code review.
a836a37 [Reynold Xin] Move object RDD to the front of RDD.scala.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Please see https://issues.apache.org/jira/browse/SPARK-4459
Author: Saldanha <saldaal1@phusca-l24858.wlan.na.novartis.net>
Closes #3327 from alokito/master and squashes the following commits:
54b1095 [Saldanha] [SPARK-4459] changed type parameter for keyBy from K to U
d5f73c3 [Saldanha] [SPARK-4459] added keyBy test
316ad77 [Saldanha] SPARK-4459 changed type parameter for groupBy from K to U.
62ddd4b [Saldanha] SPARK-4459 added failing unit test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modes
In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched. If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches.
This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode.
Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>
Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits:
ed1a25c [WangTaoTheTonic] revert unrelated formatting issue
02c4e49 [WangTao] add comment
32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext
667cf24 [WangTaoTheTonic] document fix
ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode
2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MapPartitionsRDD
MappedRDD, MappedValuesRDD, FlatMappedValuesRDD, FilteredRDD, GlommedRDD, FlatMappedRDD are not necessary. They can be implemented trivially using MapPartitionsRDD.
Author: Reynold Xin <rxin@databricks.com>
Closes #3578 from rxin/SPARK-4719 and squashes the following commits:
eed9853 [Reynold Xin] Preserve partitioning for filter.
eb1a89b [Reynold Xin] [SPARK-4719][API] Consolidate various narrow dep RDD classes with MapPartitionsRDD.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle file.
cc aarondav kayousterhout pwendell
This should go into 1.2?
Author: Reynold Xin <rxin@databricks.com>
Closes #3579 from rxin/SPARK-4085 and squashes the following commits:
255b4fd [Reynold Xin] Updated test.
f9814d9 [Reynold Xin] Code review feedback.
2afaf35 [Reynold Xin] [SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
adds Executor
The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master. Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting.
JoshRosen
Author: Mark Hamstra <markhamstra@gmail.com>
Closes #3550 from markhamstra/SPARK-4498 and squashes the following commits:
8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver
|
|
|
|
|
|
|
|
|
|
| |
ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`.
Author: zsxwing <zsxwing@gmail.com>
Closes #3575 from zsxwing/SPARK-4715 and squashes the following commits:
a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value
|
|
|
|
|
|
|
|
|
|
| |
As #3262 wasn't merged to branch 1.2, the `since` value of `deprecated` should be '1.3.0'.
Author: zsxwing <zsxwing@gmail.com>
Closes #3573 from zsxwing/SPARK-4397-version and squashes the following commits:
1daa03c [zsxwing] Change the 'since' value to '1.3.0'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672
The f closure of `PartitionsRDD(ZippedPartitionsRDD2)` contains a `$outer` that references EdgeRDD/VertexRDD, which causes task's serialization chain become very long in iterative GraphX applications. As a result, StackOverflow error will occur. If we set "f = null" in `clearDependencies()`, checkpoint() can cut off the long serialization chain. More details and explanation can be found in the JIRA.
Author: JerryLead <JerryLead@163.com>
Author: Lijie Xu <csxulijie@gmail.com>
Closes #3545 from JerryLead/my_core and squashes the following commits:
f7faea5 [JerryLead] checkpoint() should clear the f to avoid StackOverflow error
c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark
52799e3 [Lijie Xu] Merge pull request #1 from apache/master
|
|
|
|
|
|
|
|
|
|
| |
This PR cleans up `import SparkContext._` in core for SPARK-4397(#3262) to prove it really works well.
Author: zsxwing <zsxwing@gmail.com>
Closes #3530 from zsxwing/SPARK-4397-cleanup and squashes the following commits:
04e2273 [zsxwing] Cleanup 'import SparkContext._' in core
|
|
|
|
|
|
|
|
| |
Author: zsxwing <zsxwing@gmail.com>
Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits:
03cbe3f [zsxwing] Minor code and docs cleanup
|
|
|
|
|
|
|
|
|
|
| |
If `spark.akka.frameSize` > 2047, it will overflow and become negative. Should have some assertion in `maxFrameSizeBytes` to warn people.
Author: zsxwing <zsxwing@gmail.com>
Closes #3527 from zsxwing/SPARK-4664 and squashes the following commits:
0089c7a [zsxwing] Throw an exception when spark.akka.frameSize > 2047
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mode
If using spark-sql in yarn-cluster mode, print an error infomation just as the spark shell in yarn-cluster mode.
Author: carlmartin <carlmartinmax@gmail.com>
Author: huangzhaowei <carlmartinmax@gmail.com>
Closes #3479 from SaintBacchus/sparkSqlShell and squashes the following commits:
35829a9 [carlmartin] improve the description of comment
e6c1eb7 [carlmartin] add a comment in bin/spark-sql to remind user who wants to change the class
f1c5c8d [carlmartin] Merge branch 'master' into sparkSqlShell
8e112c5 [huangzhaowei] singular form
ec957bc [carlmartin] Add the some error infomation if using spark-sql in yarn-cluster mode
7bcecc2 [carlmartin] Merge branch 'master' of https://github.com/apache/spark into codereview
4fad75a [carlmartin] Add the Error infomation using spark-sql in yarn-cluster mode
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR adds the Spark version number to the UI footer; this is how it looks:
![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png)
Author: Sean Owen <sowen@cloudera.com>
Closes #3410 from srowen/SPARK-2143 and squashes the following commits:
e9b3a7a [Sean Owen] Add Spark version to footer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added a ClassTag parameter to CompactBuffer. So CompactBuffer[T] can create primitive arrays for primitive types. It will reduce the memory usage for primitive types significantly and only pay minor performance lost.
Here is my test code:
```Scala
// Call org.apache.spark.util.SizeEstimator.estimate
def estimateSize(obj: AnyRef): Long = {
val c = Class.forName("org.apache.spark.util.SizeEstimator$")
val f = c.getField("MODULE$")
val o = f.get(c)
val m = c.getMethod("estimate", classOf[Object])
m.setAccessible(true)
m.invoke(o, obj).asInstanceOf[Long]
}
sc.parallelize(1 to 10000).groupBy(_ => 1).foreach {
case (k, v) =>
println(v.getClass() + " size: " + estimateSize(v))
}
```
Using the previous CompactBuffer outputed
```
class org.apache.spark.util.collection.CompactBuffer size: 313358
```
Using the new CompactBuffer outputed
```
class org.apache.spark.util.collection.CompactBuffer size: 65712
```
In this case, the new `CompactBuffer` only used 20% memory of the previous one. It's really helpful for `groupByKey` when using a primitive value.
Author: zsxwing <zsxwing@gmail.com>
Closes #3378 from zsxwing/SPARK-4505 and squashes the following commits:
4abdbba [zsxwing] Add a ClassTag parameter to reduce the memory usage of CompactBuffer[T] when T is a primitive type
|
|
|
|
|
|
|
|
|
|
| |
Admittedly a really small tweak.
Author: Stephen Haberman <stephen@exigencecorp.com>
Closes #3514 from stephenh/include-key-name-in-npe and squashes the following commits:
937740a [Stephen Haberman] Include the key name when failing on an invalid value.
|
|
|
|
|
|
|
|
|
|
| |
`File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #3449 from viirya/fix_createtempdir and squashes the following commits:
36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.
|
|
|
|
|
|
|
|
|
|
| |
Time suffix exists in Utils.getUsedTimeMs(startTime), no need to append again, delete that
Author: maji2014 <maji3@asiainfo.com>
Closes #3475 from maji2014/SPARK-4619 and squashes the following commits:
df0da4e [maji2014] delete redundant time suffix
|