| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clear
Changes RDD.toDebugString() to show hierarchy and shuffle transformations more clearly
New output:
```
(3) FlatMappedValuesRDD[325] at apply at Transformer.scala:22
| MappedValuesRDD[324] at apply at Transformer.scala:22
| CoGroupedRDD[323] at apply at Transformer.scala:22
+-(5) MappedRDD[320] at apply at Transformer.scala:22
| | MappedRDD[319] at apply at Transformer.scala:22
| | MappedValuesRDD[318] at apply at Transformer.scala:22
| | MapPartitionsRDD[317] at apply at Transformer.scala:22
| | ShuffledRDD[316] at apply at Transformer.scala:22
| +-(10) MappedRDD[315] at apply at Transformer.scala:22
| | ParallelCollectionRDD[314] at apply at Transformer.scala:22
+-(100) MappedRDD[322] at apply at Transformer.scala:22
| ParallelCollectionRDD[321] at apply at Transformer.scala:22
```
Author: Gregory Owen <greowen@gmail.com>
Closes #1364 from GregOwen/to-debug-string and squashes the following commits:
08f5c78 [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
1603f7b [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
|
|
|
|
|
|
|
|
|
|
| |
As a result of shivaram's experience debugging long scheduler delay, I think we should improve the tooltip to point people in the right direction if scheduler delay is large.
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #1488 from kayousterhout/better_tooltips and squashes the following commits:
22176fd [Kay Ousterhout] Improve scheduler delay tooltip.
|
|
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #1474 from sryza/sandy-spark-2564 and squashes the following commits:
35b8388 [Sandy Ryza] Fix compile error on upmerge
7b985fb [Sandy Ryza] Fix test compile error
43f79e6 [Sandy Ryza] SPARK-2564. ShuffleReadMetrics.totalBlocksRead is redundant
|
|
|
|
|
|
|
|
|
|
| |
We should fix this in branch-1.0 as well.
Author: Reynold Xin <rxin@apache.org>
Closes #1500 from rxin/rangePartitioner and squashes the following commits:
c0a94f5 [Reynold Xin] [SPARK-2598] RangePartitioner's binary search does not use the given Ordering.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...s of CoGroupedRDD and PairRDDFunctions
This also removes an unnecessary tuple creation in cogroup.
Author: Sandy Ryza <sandy@cloudera.com>
Closes #1447 from sryza/sandy-spark-2519-2 and squashes the following commits:
b6d9699 [Sandy Ryza] Remove missed Tuple2 match in CoGroupedRDD
a109828 [Sandy Ryza] Remove another pattern matching in MappedValuesRDD and revert some changes in PairRDDFunctions
be10f8a [Sandy Ryza] SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical sections of CoGroupedRDD and PairRDDFunctions
|
|
|
|
|
|
| |
every task)."
This reverts commit 7b8cd175254d42c8e82f0aa8eb4b7f3508d8fde2.
|
|
|
|
|
|
|
|
|
|
| |
This is a minor change. We should first logDebug($curRequestSize) and then set it to 0.
Author: Lijie Xu <csxulijie@gmail.com>
Closes #1477 from JerryLead/patch-1 and squashes the following commits:
aed722d [Lijie Xu] put 'curRequestSize = 0' after 'logDebug' it
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently (as of Spark 1.0.1), Spark sends RDD object (which contains closures) using Akka along with the task itself to the executors. This is inefficient because all tasks in the same stage use the same RDD object, but we have to send RDD object multiple times to the executors. This is especially bad when a closure references some variable that is very large. The current design led to users having to explicitly broadcast large variables.
The patch uses broadcast to send RDD objects and the closures to executors, and use Akka to only send a reference to the broadcast RDD/closure along with the partition specific information for the task. For those of you who know more about the internals, Spark already relies on broadcast to send the Hadoop JobConf every time it uses the Hadoop input, because the JobConf is large.
The user-facing impact of the change include:
1. Users won't need to decide what to broadcast anymore, unless they would want to use a large object multiple times in different operations
2. Task size will get smaller, resulting in faster scheduling and higher task dispatch throughput.
In addition, the change will simplify some internals of Spark, eliminating the need to maintain task caches and the complex logic to broadcast JobConf (which also led to a deadlock recently).
A simple way to test this:
```scala
val a = new Array[Byte](1000*1000); scala.util.Random.nextBytes(a);
sc.parallelize(1 to 1000, 1000).map { x => a; x }.groupBy { x => a; x }.count
```
Numbers on 3 r3.8xlarge instances on EC2
```
master branch: 5.648436068 s, 4.715361895 s, 5.360161877 s
with this change: 3.416348793 s, 1.477846558 s, 1.553432156 s
```
Author: Reynold Xin <rxin@apache.org>
Closes #1452 from rxin/broadcast-task and squashes the following commits:
762e0be [Reynold Xin] Warn large broadcasts.
ade6eac [Reynold Xin] Log broadcast size.
c3b6f11 [Reynold Xin] Added a unit test for clean up.
754085f [Reynold Xin] Explain why broadcasting serialized copy of the task.
04b17f0 [Reynold Xin] [SPARK-2521] Broadcast RDD object once per TaskSet (instead of sending it for every task).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, shuffle read metrics are incorrectly reported when stages have multiple shuffle dependencies (they are set to be the metrics from just one of the shuffle dependencies, rather than the accumulated metrics from all of the shuffle dependencies). This fixes that problem, and should probably be back-ported to the 0.9 branch.
Thanks ryanra for discovering this problem!
cc rxin andrewor14
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #1476 from kayousterhout/join_bug and squashes the following commits:
0203a16 [Kay Ousterhout] Fix broken unit tests.
f463c2e [Kay Ousterhout] [SPARK-2571] Correctly report shuffle read metrics.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is going to be used in https://issues.apache.org/jira/browse/SPARK-2568
Author: Reynold Xin <rxin@apache.org>
Closes #1478 from rxin/reservoirSample and squashes the following commits:
17bcbf3 [Reynold Xin] Added seed.
badf20d [Reynold Xin] Renamed the method.
6940010 [Reynold Xin] Reservoir sampling implementation.
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #1479 from sryza/sandy-spark-2553 and squashes the following commits:
2cb5ed8 [Sandy Ryza] SPARK-2553. Fix compile error
|
|
|
|
|
|
|
|
|
|
|
|
| |
... per key
My humble opinion is that avoiding allocations in this performance-critical section is worth the extra code.
Author: Sandy Ryza <sandy@cloudera.com>
Closes #1461 from sryza/sandy-spark-2553 and squashes the following commits:
7eaf7f2 [Sandy Ryza] SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency per key
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Problem.** Right now, if you click on an application after it has finished, it simply refreshes the page if there are no event logs for the application. This is not super intuitive especially because event logging is not enabled by default. We should direct the user to enable this if they attempt to view a SparkUI after the fact without event logs.
**Fix.** The new page conveys different messages in each of the following scenarios:
(1) Application did not enable event logging,
(2) Event logs are not found in the specified directory, and
(3) Exception is thrown while replaying the logs
Here are screenshots of what the page looks like in each of the above scenarios:
(1)
<img src="https://issues.apache.org/jira/secure/attachment/12656204/Event%20logging%20not%20enabled.png" width="75%">
(2)
<img src="https://issues.apache.org/jira/secure/attachment/12656203/Application%20history%20not%20found.png">
(3)
<img src="https://issues.apache.org/jira/secure/attachment/12656202/Application%20history%20load%20error.png" width="95%">
Author: Andrew Or <andrewor14@gmail.com>
Closes #1336 from andrewor14/master-link and squashes the following commits:
2f06206 [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link
97cddc0 [Andrew Or] Add different severity levels
832b687 [Andrew Or] Mention spark.eventLog.dir in error message
51980c3 [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link
ded208c [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link
89d6405 [Andrew Or] Reword message
e7df7ed [Andrew Or] Add a history not found page to standalone Master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should reduce memory usage for the web ui as well as slightly increase its speed in draining the UI event queue.
@andrewor14
Author: Reynold Xin <rxin@apache.org>
Closes #1262 from rxin/ui-consolidate-hashtables and squashes the following commits:
1ac3f97 [Reynold Xin] Oops. Properly handle description.
f5736ad [Reynold Xin] Code review comments.
b8828dc [Reynold Xin] Merge branch 'master' into ui-consolidate-hashtables
7a7b6c4 [Reynold Xin] Revert css change.
f959bb8 [Reynold Xin] [SPARK-2299] Consolidate various stageIdTo* hash maps in JobProgressListener to speed it up.
63256f5 [Reynold Xin] [SPARK-2320] Reduce <pre> block font size.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should go into both master and branch-1.0.
Author: Reynold Xin <rxin@apache.org>
Closes #1450 from rxin/agg-closure and squashes the following commits:
e40f363 [Reynold Xin] Mima check excludes.
9186364 [Reynold Xin] Define the return type more explicitly.
38e348b [Reynold Xin] Fixed the cases in RDD.scala.
ea6b34d [Reynold Xin] Blah
89b9c43 [Reynold Xin] Fix other instances of accidentally pulling in extra stuff in closures.
73b2783 [Reynold Xin] [SPARK-2534] Avoid pulling in the entire RDD in groupByKey.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is currently non-trivial to trace through how different combinations of cluster managers (e.g. yarn) and deploy modes (e.g. cluster) are processed in SparkSubmit. Moving forward, it will be easier to extend SparkSubmit if we first re-organize the code by grouping related logic together.
This is a precursor to fixing standalone-cluster mode, which is currently broken (SPARK-2260).
Author: Andrew Or <andrewor14@gmail.com>
Closes #1349 from andrewor14/submit-cleanup and squashes the following commits:
8f99200 [Andrew Or] script -> program (minor)
30f2e65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-cleanup
fe484a1 [Andrew Or] Move deploy mode checks after yarn code
7167824 [Andrew Or] Re-order config options and update comments
0b01ff8 [Andrew Or] Clean up SparkSubmit for readability
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the first pass of CoalescedRDD does not find the target number of locations AND the second pass finds new locations, an exception is thrown, as "groupHash.get(nxt_replica).get" is not valid.
The fix is just to add an ArrayBuffer to groupHash for that replica if it didn't already exist.
Author: Aaron Davidson <aaron@databricks.com>
Closes #1337 from aarondav/2412 and squashes the following commits:
f587b5d [Aaron Davidson] getOrElseUpdate
3ad8a3c [Aaron Davidson] [SPARK-2412] CoalescedRDD throws exception with certain pref locs
|
|
|
|
|
|
|
|
| |
Author: Aaron Davidson <aaron@databricks.com>
Closes #1405 from aarondav/2154 and squashes the following commits:
24e9ef9 [Aaron Davidson] [SPARK-2154] Schedule next Driver when one completes (standalone mode)
|
|
|
|
|
|
|
|
|
|
|
|
| |
We recently added this lock on 'conf' in order to prevent concurrent creation. However, it turns out that this can introduce a deadlock because Hadoop also synchronizes on the Configuration objects when creating new Configurations (and they do so via a static REGISTRY which contains all created Configurations).
This fix forces all Spark initialization of Configuration objects to occur serially by using a static lock that we control, and thus also prevents introducing the deadlock.
Author: Aaron Davidson <aaron@databricks.com>
Closes #1409 from aarondav/1054 and squashes the following commits:
7d1b769 [Aaron Davidson] SPARK-1097: Do not introduce deadlock while fixing concurrency bug
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use TID to indicate task logging. However, TID itself does not capture stage or retries, making it harder to correlate with the application itself. This pull request changes all logging messages for tasks to include both the TID and the stage id, stage attempt, task id, and task attempt. I've consulted various people but unfortunately this is a really hard task.
Driver log looks like:
```
14/06/28 18:53:29 INFO DAGScheduler: Submitting 10 missing tasks from Stage 0 (MappedRDD[1] at map at <console>:13)
14/06/28 18:53:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
14/06/28 18:53:29 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 19:44:40 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1855 bytes)
14/07/15 19:44:40 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1855 bytes)
14/07/15 19:44:40 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1855 bytes)
14/07/15 19:44:40 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1855 bytes)
14/07/15 19:44:40 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 4, localhost, PROCESS_LOCAL, 1855 bytes)
14/07/15 19:44:40 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 5, localhost, PROCESS_LOCAL, 1855 bytes)
14/07/15 19:44:40 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 6, localhost, PROCESS_LOCAL, 1855 bytes)
...
14/07/15 19:44:40 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 64 ms on localhost (4/10)
14/07/15 19:44:40 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 4) in 63 ms on localhost (5/10)
14/07/15 19:44:40 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 2) in 63 ms on localhost (6/10)
14/07/15 19:44:40 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 7) in 62 ms on localhost (7/10)
14/07/15 19:44:40 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 6) in 63 ms on localhost (8/10)
14/07/15 19:44:40 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 9) in 8 ms on localhost (9/10)
14/07/15 19:44:40 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 8) in 9 ms on localhost (10/10)
```
Executor log looks like
```
14/07/15 19:44:40 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
14/07/15 19:44:40 INFO Executor: Running task 3.0 in stage 1.0 (TID 3)
14/07/15 19:44:40 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
14/07/15 19:44:40 INFO Executor: Running task 4.0 in stage 1.0 (TID 4)
14/07/15 19:44:40 INFO Executor: Running task 2.0 in stage 1.0 (TID 2)
14/07/15 19:44:40 INFO Executor: Running task 5.0 in stage 1.0 (TID 5)
14/07/15 19:44:40 INFO Executor: Running task 6.0 in stage 1.0 (TID 6)
14/07/15 19:44:40 INFO Executor: Running task 7.0 in stage 1.0 (TID 7)
14/07/15 19:44:40 INFO Executor: Finished task 3.0 in stage 1.0 (TID 3). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 2.0 in stage 1.0 (TID 2). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 5.0 in stage 1.0 (TID 5). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 4.0 in stage 1.0 (TID 4). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 6.0 in stage 1.0 (TID 6). 847 bytes result sent to driver
14/07/15 19:44:40 INFO Executor: Finished task 7.0 in stage 1.0 (TID 7). 847 bytes result sent to driver
```
Author: Reynold Xin <rxin@apache.org>
Closes #1259 from rxin/betterTaskLogging and squashes the following commits:
c28ada1 [Reynold Xin] Fix unit test failure.
987d043 [Reynold Xin] Updated log messages.
c6cfd46 [Reynold Xin] Merge branch 'master' into betterTaskLogging
b7b1bcc [Reynold Xin] Fixed a typo.
f9aba3c [Reynold Xin] Made it compile.
f8a5c06 [Reynold Xin] Merge branch 'master' into betterTaskLogging
07264e6 [Reynold Xin] Defensive check against unknown TaskEndReason.
76bbd18 [Reynold Xin] FailureSuite not serializable reporting.
4659b20 [Reynold Xin] Remove unused variable.
53888e3 [Reynold Xin] [SPARK-2317] Improve task logging.
|
|
|
|
|
|
|
|
|
|
| |
HttpBroadcastFactory is the current default broadcast factory. It sends the broadcast data to each worker one by one, which is slow when the cluster is big. TorrentBroadcastFactory scales much better than http. Maybe we should make torrent the default broadcast method.
Author: Xiangrui Meng <meng@databricks.com>
Closes #1437 from mengxr/bt-broadcast and squashes the following commits:
ed492fe [Xiangrui Meng] set default broadcast factory to torrent
|
|
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@apache.org>
Closes #1433 from rxin/compile-warning and squashes the following commits:
8d0b890 [Reynold Xin] Remove some compiler warnings.
|
|
|
|
|
|
|
|
|
|
| |
... aggregation code
Author: Sandy Ryza <sandy@cloudera.com>
Closes #1435 from sryza/sandy-spark-2519 and squashes the following commits:
640706a [Sandy Ryza] SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical aggregation code
|
|
|
|
|
|
|
|
|
|
| |
In preparation for SPARK-2521.
Author: Reynold Xin <rxin@apache.org>
Closes #1438 from rxin/broadcast and squashes the following commits:
432f1cc [Reynold Xin] Tightening visibility for various Broadcast related classes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hi mateiz, I've created [SPARK-2277](https://issues.apache.org/jira/browse/SPARK-2277) to make TaskScheduler track hosts on each rack. Please help to review, thanks.
Author: Rui Li <rui.li@intel.com>
Closes #1212 from lirui-intel/trackHostOnRack and squashes the following commits:
2b4bd0f [Rui Li] SPARK-2277: refine UT
fbde838 [Rui Li] SPARK-2277: add UT
7bbe658 [Rui Li] SPARK-2277: rename the method
5e4ef62 [Rui Li] SPARK-2277: remove unnecessary import
79ac750 [Rui Li] SPARK-2277: make TaskScheduler track hosts on rack
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BlockManagerMasterActor.register method
PR for SPARK-2500
Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call "new BlockManagerInfo" without actually registering a BlockManager and could confuse when reading the log files.
Author: Henry Saputra <henry.saputra@gmail.com>
Closes #1424 from hsaputra/move_registerblockmanager_log_to_registration_method and squashes the following commits:
3370b4a [Henry Saputra] Move the loginfo for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
|
|
|
|
|
|
|
|
|
|
| |
This reduces shuffle compression memory usage by 3x.
Author: Reynold Xin <rxin@apache.org>
Closes #1415 from rxin/snappy and squashes the following commits:
06c1a01 [Reynold Xin] SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression codec.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #1112 from witgo/SPARK-1291 and squashes the following commits:
6022bcd [witgo] review commit
1fbb925 [witgo] add addAmIpFilter to yarn alpha
210299c [witgo] review commit
1b92a07 [witgo] review commit
6896586 [witgo] Add comments to addWebUIFilter
3e9630b [witgo] review commit
142ee29 [witgo] review commit
1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
|
|
|
|
|
|
|
|
| |
Author: William Benton <willb@redhat.com>
Closes #1419 from willb/reformat-2486 and squashes the following commits:
2676231 [William Benton] Reformat multi-line closure argument.
|
|
|
|
|
|
|
|
|
|
|
| |
Based on Greg Bowyer's patch from JIRA https://issues.apache.org/jira/browse/SPARK-2399
Author: Reynold Xin <rxin@apache.org>
Closes #1416 from rxin/lz4 and squashes the following commits:
6c8fefe [Reynold Xin] Fixed typo.
8a14d38 [Reynold Xin] [SPARK-2399] Add support for LZ4 compression.
|
|
|
|
|
|
|
|
|
|
| |
When completedDrivers number exceeds the threshold, the first Max(spark.deploy.retainedDrivers, 1) will be discarded.
Author: lianhuiwang <lianhuiwang09@gmail.com>
Closes #1114 from lianhuiwang/retained-drivers and squashes the following commits:
8789418 [lianhuiwang] discarded exceeded completedDrivers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
space of HDFS
When running jobs with YARN Cluster mode and using HistoryServer, the files in the Staging Directory (~/.sparkStaging on HDFS) cannot be deleted.
HistoryServer uses directory where event log is written, and the directory is represented as a instance of o.a.h.f.FileSystem created by using FileSystem.get.
On the other hand, ApplicationMaster has a instance named fs, which also created by using FileSystem.get.
FileSystem.get returns cached same instance when URI passed to the method represents same file system and the method is called by same user.
Because of the behavior, when the directory for event log is on HDFS, fs of ApplicationMaster and fileSystem of FileLogger is same instance.
When shutting down ApplicationMaster, fileSystem.close is called in FileLogger#stop, which is invoked by SparkContext#stop indirectly.
And ApplicationMaster#cleanupStagingDir also called by JVM shutdown hook. In this method, fs.delete(stagingDirPath) is invoked.
Because fs.delete in ApplicationMaster is called after fileSystem.close in FileLogger, fs.delete fails and results not deleting files in the staging directory.
I think, calling fileSystem.delete is not needed.
Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Closes #1326 from sarutak/SPARK-2390 and squashes the following commits:
10e1a88 [Kousuke Saruta] Removed fileSystem.close from FileLogger.scala not to prevent any other FileSystem operation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
groupBy()/groupByKey() is notorious for being a very convenient API that can lead to poor performance when used incorrectly.
This PR just makes it clear that users should be cautious not to rely on this API when they really want a different (more performant) one, such as reduceByKey().
(Note that one source of confusion is the name; this groupBy() is not the same as a SQL GROUP-BY, which is used for aggregation and is more similar in nature to Spark's reduceByKey().)
Author: Aaron Davidson <aaron@databricks.com>
Closes #1380 from aarondav/warning and squashes the following commits:
f60da39 [Aaron Davidson] Give better advice
d0afb68 [Aaron Davidson] Add/increase severity of warning in documentation of groupBy()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running Spark under certain instrumenting profilers,
Utils.getCallSite could crash with an NPE. This commit
makes it more resilient to failures occurring while inspecting
stack frames.
Author: William Benton <willb@redhat.com>
Closes #1413 from willb/spark-2486 and squashes the following commits:
b7c0274 [William Benton] Use explicit null checks instead of Try()
0f0c1ae [William Benton] Utils.getCallSite is now resilient to bogus frames
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
registered
Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality.
A simple solution is sleeping few seconds in application, so that executors have enough time to register.
The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered.
\# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0
spark.scheduler.minRegisteredExecutorsRatio = 0.8
\# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000
spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000
Author: li-zhihui <zhihui.li@intel.com>
Closes #900 from li-zhihui/master and squashes the following commits:
b9f8326 [li-zhihui] Add logs & edit docs
1ac08b1 [li-zhihui] Add new configs to user docs
22ead12 [li-zhihui] Move waitBackendReady to postStartHook
c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS
4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor
0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks
4261454 [li-zhihui] Add docs for new configs & code style
ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime
6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha
812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode
e7b6272 [li-zhihui] support yarn-cluster
37f7dc2 [li-zhihui] support yarn mode(percentage style)
3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
|
|
|
|
|
|
|
|
|
|
| |
Just move some test suite to corresponding package
Author: Daoyuan <daoyuan.wang@intel.com>
Closes #1401 from adrian-wang/movetestfiles and squashes the following commits:
d1a6803 [Daoyuan] move some test file to match src code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or may not be the right one.)
I created this because I ran into the following problem. I have x:RDD[X] with X being defined in the jar that I provide to SparkContext. I save it with x.saveAsObjectFile("x"). I try to load it with sc.objectFile\[X\]("x"). It fails with ClassNotFoundException.
After a good while of debugging I figured out that Utils.deserialize() most likely uses the ClassLoader of Utils. This is the bootstrap ClassLoader, so it is not aware of the dynamically added jars. This patch fixes the issue.
A more robust fix would be to always default to Thread.currentThread.getContextClassLoader. This would prevent this problem from biting anyone in the future. It would be a bit harder to test though. On the topic of testing, if you'd like to see tests for this, I will need some hand-holding. Thanks!
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes #181 from darabos/master and squashes the following commits:
45a011a [Daniel Darabos] Add test for SPARK-1877. (Fixed in 52eb54d.)
e13e090 [Daniel Darabos] Merge branch 'master' of https://github.com/apache/spark
61fe0d0 [Daniel Darabos] Fix style (line too long).
1b5df2c [Daniel Darabos] Use the Executor's ClassLoader in sc.objectFile(). This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or may not be the right one.)
|
|
|
|
|
|
|
|
|
| |
Author: Andrew Or <andrewor14@gmail.com>
Closes #1365 from andrewor14/master-fs and squashes the following commits:
497f100 [Andrew Or] Sneak in a space and hope no one will notice
05ba6da [Andrew Or] Remove unused val
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch introduces the new way of working also retaining the existing ways of doing things.
For example build instruction for yarn in maven is
`mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
in sbt it can become
`MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
Also supports
`sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Patrick Wendell <pwendell@gmail.com>
Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
a8ac951 [Prashant Sharma] Updated sbt version.
62b09bb [Prashant Sharma] Improvements.
fa6221d [Prashant Sharma] Excluding sql from mima
4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
72651ca [Prashant Sharma] Addresses code reivew comments.
acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
ac4312c [Prashant Sharma] Revert "minor fix"
6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
446768e [Prashant Sharma] minor fix
89b9777 [Prashant Sharma] Merge conflicts
d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
dccc8ac [Prashant Sharma] updated mima to check against 1.0
a49c61b [Prashant Sharma] Fix for tools jar
a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
cf88758 [Prashant Sharma] cleanup
9439ea3 [Prashant Sharma] Small fix to run-examples script.
96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
4973dbd [Patrick Wendell] Example build using pom reader.
|
|
|
|
|
|
|
|
|
|
| |
Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
Closes #1350 from tsudukim/feature/SPARK-2115 and squashes the following commits:
e2263b0 [Masayoshi TSUZUKI] Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds tooltips to clarify some points of confusion in the UI. When users mouse over some of the table headers (shuffle read, write, and input size) as well as over the "scheduler delay" metric shown for each stage, a black tool tip (see image below) pops up describing the metric in more detail. After the tooltip mechanism is added by this commit, I imagine others may want to add more tooltips for other things in the UI, but I think this is a good starting point.
![tooltip](https://cloud.githubusercontent.com/assets/1108612/3491905/994e179e-059f-11e4-92f2-c6c12d248d81.jpg)
This looks scary-big but much of it is adding the bootstrap tool tip JavaScript.
Also I have no idea what to put for the license in tooltip (I left it the same -- the Twitter apache header) or for JQuery (left it as nothing) -- @mateiz what's the right thing here?
cc @pwendell @andrewor14 @rxin
Author: Kay Ousterhout <kayousterhout@gmail.com>
Closes #1314 from kayousterhout/tooltips and squashes the following commits:
19981b5 [Kay Ousterhout] Exclude non-licensed javascript files from style check
d9ab5a9 [Kay Ousterhout] Response to Andrew's review
7752449 [Kay Ousterhout] [SPARK-2384] Add tooltips to UI.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Executors currently start their own unused HTTP file servers. This is because we use the same SparkEnv class for both executors and drivers, and we do not distinguish this case.
In the longer term, we should separate out SparkEnv for the driver and SparkEnv for the executors.
Author: Andrew Or <andrewor14@gmail.com>
Closes #1335 from andrewor14/executor-http-server and squashes the following commits:
46ef263 [Andrew Or] Start HTTP server only on the driver
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-2403
Spark hangs for us whenever we forget to register a class with Kryo. This should be a simple fix for that. But let me know if you have a better suggestion.
I did not write a new test for this. It would be pretty complicated and I'm not sure it's worthwhile for such a simple change. Let me know if you disagree.
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes #1329 from darabos/spark-2403 and squashes the following commits:
3aceaad [Daniel Darabos] Print full stack trace for miscellaneous exceptions during serialization.
52c22ba [Daniel Darabos] Only catch NonFatal exceptions.
361e962 [Daniel Darabos] Catch all errors during serialization in DAGScheduler.
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #1153 from witgo/expectResult and squashes the following commits:
97541d8 [witgo] merge master
ead26e7 [witgo] Resolve sbt warnings during build
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the non registration of BoundedPriorityQueue with kryoserializer, operations which are dependend on BoundedPriorityQueue are giving exceptions.One such instance is using top along with kryo serialization.
Fixed the issue by registering BoundedPriorityQueue with kryoserializer.
Author: ankit.bhardwaj <ankit.bhardwaj@guavus.com>
Closes #1299 from AnkitBhardwaj12/BoundedPriorityQueueWithKryoIssue and squashes the following commits:
a4ae8ed [ankit.bhardwaj] [SPARK-2306]:BoundedPriorityQueue is private and not registered with Kryo
|
|
|
|
|
|
|
|
|
|
| |
This was omitted in #1260. @aarondav
Author: Reynold Xin <rxin@apache.org>
Closes #1300 from rxin/historyServer and squashes the following commits:
af720a3 [Reynold Xin] Added SignalLogger to HistoryServer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA: https://issues.apache.org/jira/browse/SPARK-2282
This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes.
This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets.
Author: Aaron Davidson <aaron@databricks.com>
Closes #1220 from aarondav/SPARK-2282 and squashes the following commits:
2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Problem.** The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors.
**Solution.** The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`.
Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`.
----------
If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused.
Author: Andrew Or <andrewor14@gmail.com>
Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits:
45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise
a82ea25 [Andrew Or] Add tests for StorageStatusListener
8773b01 [Andrew Or] Update comment / minor changes
3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI
|
|
|
|
|
|
|
|
|
|
| |
Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe.
Author: Aaron Davidson <aaron@databricks.com>
Closes #1289 from aarondav/master-fail and squashes the following commits:
1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers
|
|
|
|
|
|
|
|
|
|
|
| |
Workaround Hadoop conf ConcurrentModification issue
Author: Raymond Liu <raymond.liu@intel.com>
Closes #1273 from colorant/hadoopRDD and squashes the following commits:
994e98b [Raymond Liu] Address comments
e2cda3d [Raymond Liu] Workaround Hadoop conf ConcurrentModification issue
|