aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-2086] Improve output of toDebugString to make shuffle boundaries more ↵Gregory Owen2014-07-211-4/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clear Changes RDD.toDebugString() to show hierarchy and shuffle transformations more clearly New output: ``` (3) FlatMappedValuesRDD[325] at apply at Transformer.scala:22 | MappedValuesRDD[324] at apply at Transformer.scala:22 | CoGroupedRDD[323] at apply at Transformer.scala:22 +-(5) MappedRDD[320] at apply at Transformer.scala:22 | | MappedRDD[319] at apply at Transformer.scala:22 | | MappedValuesRDD[318] at apply at Transformer.scala:22 | | MapPartitionsRDD[317] at apply at Transformer.scala:22 | | ShuffledRDD[316] at apply at Transformer.scala:22 | +-(10) MappedRDD[315] at apply at Transformer.scala:22 | | ParallelCollectionRDD[314] at apply at Transformer.scala:22 +-(100) MappedRDD[322] at apply at Transformer.scala:22 | ParallelCollectionRDD[321] at apply at Transformer.scala:22 ``` Author: Gregory Owen <greowen@gmail.com> Closes #1364 from GregOwen/to-debug-string and squashes the following commits: 08f5c78 [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly 1603f7b [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
* Improve scheduler delay tooltip.Kay Ousterhout2014-07-201-3/+3
| | | | | | | | | | As a result of shivaram's experience debugging long scheduler delay, I think we should improve the tooltip to point people in the right direction if scheduler delay is large. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1488 from kayousterhout/better_tooltips and squashes the following commits: 22176fd [Kay Ousterhout] Improve scheduler delay tooltip.
* SPARK-2564. ShuffleReadMetrics.totalBlocksRead is redundantSandy Ryza2014-07-205-11/+2
| | | | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #1474 from sryza/sandy-spark-2564 and squashes the following commits: 35b8388 [Sandy Ryza] Fix compile error on upmerge 7b985fb [Sandy Ryza] Fix test compile error 43f79e6 [Sandy Ryza] SPARK-2564. ShuffleReadMetrics.totalBlocksRead is redundant
* [SPARK-2598] RangePartitioner's binary search does not use the given OrderingReynold Xin2014-07-203-5/+20
| | | | | | | | | | We should fix this in branch-1.0 as well. Author: Reynold Xin <rxin@apache.org> Closes #1500 from rxin/rangePartitioner and squashes the following commits: c0a94f5 [Reynold Xin] [SPARK-2598] RangePartitioner's binary search does not use the given Ordering.
* SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical section...Sandy Ryza2014-07-203-33/+33
| | | | | | | | | | | | | | ...s of CoGroupedRDD and PairRDDFunctions This also removes an unnecessary tuple creation in cogroup. Author: Sandy Ryza <sandy@cloudera.com> Closes #1447 from sryza/sandy-spark-2519-2 and squashes the following commits: b6d9699 [Sandy Ryza] Remove missed Tuple2 match in CoGroupedRDD a109828 [Sandy Ryza] Remove another pattern matching in MappedValuesRDD and revert some changes in PairRDDFunctions be10f8a [Sandy Ryza] SPARK-2519 part 2. Remove pattern matching on Tuple2 in critical sections of CoGroupedRDD and PairRDDFunctions
* Revert "[SPARK-2521] Broadcast RDD object (instead of sending it along with ↵Reynold Xin2014-07-198-137/+251
| | | | | | every task)." This reverts commit 7b8cd175254d42c8e82f0aa8eb4b7f3508d8fde2.
* put 'curRequestSize = 0' after 'logDebug' itLijie Xu2014-07-191-1/+1
| | | | | | | | | | This is a minor change. We should first logDebug($curRequestSize) and then set it to 0. Author: Lijie Xu <csxulijie@gmail.com> Closes #1477 from JerryLead/patch-1 and squashes the following commits: aed722d [Lijie Xu] put 'curRequestSize = 0' after 'logDebug' it
* [SPARK-2521] Broadcast RDD object (instead of sending it along with every task).Reynold Xin2014-07-188-251/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently (as of Spark 1.0.1), Spark sends RDD object (which contains closures) using Akka along with the task itself to the executors. This is inefficient because all tasks in the same stage use the same RDD object, but we have to send RDD object multiple times to the executors. This is especially bad when a closure references some variable that is very large. The current design led to users having to explicitly broadcast large variables. The patch uses broadcast to send RDD objects and the closures to executors, and use Akka to only send a reference to the broadcast RDD/closure along with the partition specific information for the task. For those of you who know more about the internals, Spark already relies on broadcast to send the Hadoop JobConf every time it uses the Hadoop input, because the JobConf is large. The user-facing impact of the change include: 1. Users won't need to decide what to broadcast anymore, unless they would want to use a large object multiple times in different operations 2. Task size will get smaller, resulting in faster scheduling and higher task dispatch throughput. In addition, the change will simplify some internals of Spark, eliminating the need to maintain task caches and the complex logic to broadcast JobConf (which also led to a deadlock recently). A simple way to test this: ```scala val a = new Array[Byte](1000*1000); scala.util.Random.nextBytes(a); sc.parallelize(1 to 1000, 1000).map { x => a; x }.groupBy { x => a; x }.count ``` Numbers on 3 r3.8xlarge instances on EC2 ``` master branch: 5.648436068 s, 4.715361895 s, 5.360161877 s with this change: 3.416348793 s, 1.477846558 s, 1.553432156 s ``` Author: Reynold Xin <rxin@apache.org> Closes #1452 from rxin/broadcast-task and squashes the following commits: 762e0be [Reynold Xin] Warn large broadcasts. ade6eac [Reynold Xin] Log broadcast size. c3b6f11 [Reynold Xin] Added a unit test for clean up. 754085f [Reynold Xin] Explain why broadcasting serialized copy of the task. 04b17f0 [Reynold Xin] [SPARK-2521] Broadcast RDD object once per TaskSet (instead of sending it for every task).
* [SPARK-2571] Correctly report shuffle read metrics.Kay Ousterhout2014-07-186-12/+27
| | | | | | | | | | | | | | | Currently, shuffle read metrics are incorrectly reported when stages have multiple shuffle dependencies (they are set to be the metrics from just one of the shuffle dependencies, rather than the accumulated metrics from all of the shuffle dependencies). This fixes that problem, and should probably be back-ported to the 0.9 branch. Thanks ryanra for discovering this problem! cc rxin andrewor14 Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1476 from kayousterhout/join_bug and squashes the following commits: 0203a16 [Kay Ousterhout] Fix broken unit tests. f463c2e [Kay Ousterhout] [SPARK-2571] Correctly report shuffle read metrics.
* Reservoir sampling implementation.Reynold Xin2014-07-182-0/+67
| | | | | | | | | | | | This is going to be used in https://issues.apache.org/jira/browse/SPARK-2568 Author: Reynold Xin <rxin@apache.org> Closes #1478 from rxin/reservoirSample and squashes the following commits: 17bcbf3 [Reynold Xin] Added seed. badf20d [Reynold Xin] Renamed the method. 6940010 [Reynold Xin] Reservoir sampling implementation.
* SPARK-2553. Fix compile errorSandy Ryza2014-07-181-0/+1
| | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #1479 from sryza/sandy-spark-2553 and squashes the following commits: 2cb5ed8 [Sandy Ryza] SPARK-2553. Fix compile error
* SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency...Sandy Ryza2014-07-171-1/+5
| | | | | | | | | | | | ... per key My humble opinion is that avoiding allocations in this performance-critical section is worth the extra code. Author: Sandy Ryza <sandy@cloudera.com> Closes #1461 from sryza/sandy-spark-2553 and squashes the following commits: 7eaf7f2 [Sandy Ryza] SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency per key
* [SPARK-2411] Add a history-not-found page to standalone MasterAndrew Or2014-07-176-31/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | **Problem.** Right now, if you click on an application after it has finished, it simply refreshes the page if there are no event logs for the application. This is not super intuitive especially because event logging is not enabled by default. We should direct the user to enable this if they attempt to view a SparkUI after the fact without event logs. **Fix.** The new page conveys different messages in each of the following scenarios: (1) Application did not enable event logging, (2) Event logs are not found in the specified directory, and (3) Exception is thrown while replaying the logs Here are screenshots of what the page looks like in each of the above scenarios: (1) <img src="https://issues.apache.org/jira/secure/attachment/12656204/Event%20logging%20not%20enabled.png" width="75%"> (2) <img src="https://issues.apache.org/jira/secure/attachment/12656203/Application%20history%20not%20found.png"> (3) <img src="https://issues.apache.org/jira/secure/attachment/12656202/Application%20history%20load%20error.png" width="95%"> Author: Andrew Or <andrewor14@gmail.com> Closes #1336 from andrewor14/master-link and squashes the following commits: 2f06206 [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link 97cddc0 [Andrew Or] Add different severity levels 832b687 [Andrew Or] Mention spark.eventLog.dir in error message 51980c3 [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link ded208c [Andrew Or] Merge branch 'master' of github.com:apache/spark into master-link 89d6405 [Andrew Or] Reword message e7df7ed [Andrew Or] Add a history not found page to standalone Master
* [SPARK-2299] Consolidate various stageIdTo* hash maps in JobProgressListenerReynold Xin2014-07-177-224/+205
| | | | | | | | | | | | | | | | | This should reduce memory usage for the web ui as well as slightly increase its speed in draining the UI event queue. @andrewor14 Author: Reynold Xin <rxin@apache.org> Closes #1262 from rxin/ui-consolidate-hashtables and squashes the following commits: 1ac3f97 [Reynold Xin] Oops. Properly handle description. f5736ad [Reynold Xin] Code review comments. b8828dc [Reynold Xin] Merge branch 'master' into ui-consolidate-hashtables 7a7b6c4 [Reynold Xin] Revert css change. f959bb8 [Reynold Xin] [SPARK-2299] Consolidate various stageIdTo* hash maps in JobProgressListener to speed it up. 63256f5 [Reynold Xin] [SPARK-2320] Reduce <pre> block font size.
* [SPARK-2534] Avoid pulling in the entire RDD in various operatorsReynold Xin2014-07-172-30/+28
| | | | | | | | | | | | | | | This should go into both master and branch-1.0. Author: Reynold Xin <rxin@apache.org> Closes #1450 from rxin/agg-closure and squashes the following commits: e40f363 [Reynold Xin] Mima check excludes. 9186364 [Reynold Xin] Define the return type more explicitly. 38e348b [Reynold Xin] Fixed the cases in RDD.scala. ea6b34d [Reynold Xin] Blah 89b9c43 [Reynold Xin] Fix other instances of accidentally pulling in extra stuff in closures. 73b2783 [Reynold Xin] [SPARK-2534] Avoid pulling in the entire RDD in groupByKey.
* [SPARK-2423] Clean up SparkSubmit for readabilityAndrew Or2014-07-171-144/+145
| | | | | | | | | | | | | | | | It is currently non-trivial to trace through how different combinations of cluster managers (e.g. yarn) and deploy modes (e.g. cluster) are processed in SparkSubmit. Moving forward, it will be easier to extend SparkSubmit if we first re-organize the code by grouping related logic together. This is a precursor to fixing standalone-cluster mode, which is currently broken (SPARK-2260). Author: Andrew Or <andrewor14@gmail.com> Closes #1349 from andrewor14/submit-cleanup and squashes the following commits: 8f99200 [Andrew Or] script -> program (minor) 30f2e65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-cleanup fe484a1 [Andrew Or] Move deploy mode checks after yarn code 7167824 [Andrew Or] Re-order config options and update comments 0b01ff8 [Andrew Or] Clean up SparkSubmit for readability
* [SPARK-2412] CoalescedRDD throws exception with certain pref locsAaron Davidson2014-07-172-2/+16
| | | | | | | | | | | | | If the first pass of CoalescedRDD does not find the target number of locations AND the second pass finds new locations, an exception is thrown, as "groupHash.get(nxt_replica).get" is not valid. The fix is just to add an ArrayBuffer to groupHash for that replica if it didn't already exist. Author: Aaron Davidson <aaron@databricks.com> Closes #1337 from aarondav/2412 and squashes the following commits: f587b5d [Aaron Davidson] getOrElseUpdate 3ad8a3c [Aaron Davidson] [SPARK-2412] CoalescedRDD throws exception with certain pref locs
* [SPARK-2154] Schedule next Driver when one completes (standalone mode)Aaron Davidson2014-07-161-0/+1
| | | | | | | | Author: Aaron Davidson <aaron@databricks.com> Closes #1405 from aarondav/2154 and squashes the following commits: 24e9ef9 [Aaron Davidson] [SPARK-2154] Schedule next Driver when one completes (standalone mode)
* SPARK-1097: Do not introduce deadlock while fixing concurrency bugAaron Davidson2014-07-161-2/+5
| | | | | | | | | | | | We recently added this lock on 'conf' in order to prevent concurrent creation. However, it turns out that this can introduce a deadlock because Hadoop also synchronizes on the Configuration objects when creating new Configurations (and they do so via a static REGISTRY which contains all created Configurations). This fix forces all Spark initialization of Configuration objects to occur serially by using a static lock that we control, and thus also prevents introducing the deadlock. Author: Aaron Davidson <aaron@databricks.com> Closes #1409 from aarondav/1054 and squashes the following commits: 7d1b769 [Aaron Davidson] SPARK-1097: Do not introduce deadlock while fixing concurrency bug
* [SPARK-2317] Improve task logging.Reynold Xin2014-07-1610-76/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use TID to indicate task logging. However, TID itself does not capture stage or retries, making it harder to correlate with the application itself. This pull request changes all logging messages for tasks to include both the TID and the stage id, stage attempt, task id, and task attempt. I've consulted various people but unfortunately this is a really hard task. Driver log looks like: ``` 14/06/28 18:53:29 INFO DAGScheduler: Submitting 10 missing tasks from Stage 0 (MappedRDD[1] at map at <console>:13) 14/06/28 18:53:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks 14/06/28 18:53:29 INFO TaskSetManager: Re-computing pending task lists. 14/07/15 19:44:40 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1855 bytes) 14/07/15 19:44:40 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1855 bytes) 14/07/15 19:44:40 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 2, localhost, PROCESS_LOCAL, 1855 bytes) 14/07/15 19:44:40 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 3, localhost, PROCESS_LOCAL, 1855 bytes) 14/07/15 19:44:40 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 4, localhost, PROCESS_LOCAL, 1855 bytes) 14/07/15 19:44:40 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 5, localhost, PROCESS_LOCAL, 1855 bytes) 14/07/15 19:44:40 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 6, localhost, PROCESS_LOCAL, 1855 bytes) ... 14/07/15 19:44:40 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 64 ms on localhost (4/10) 14/07/15 19:44:40 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 4) in 63 ms on localhost (5/10) 14/07/15 19:44:40 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 2) in 63 ms on localhost (6/10) 14/07/15 19:44:40 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 7) in 62 ms on localhost (7/10) 14/07/15 19:44:40 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 6) in 63 ms on localhost (8/10) 14/07/15 19:44:40 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 9) in 8 ms on localhost (9/10) 14/07/15 19:44:40 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 8) in 9 ms on localhost (10/10) ``` Executor log looks like ``` 14/07/15 19:44:40 INFO Executor: Running task 0.0 in stage 1.0 (TID 0) 14/07/15 19:44:40 INFO Executor: Running task 3.0 in stage 1.0 (TID 3) 14/07/15 19:44:40 INFO Executor: Running task 1.0 in stage 1.0 (TID 1) 14/07/15 19:44:40 INFO Executor: Running task 4.0 in stage 1.0 (TID 4) 14/07/15 19:44:40 INFO Executor: Running task 2.0 in stage 1.0 (TID 2) 14/07/15 19:44:40 INFO Executor: Running task 5.0 in stage 1.0 (TID 5) 14/07/15 19:44:40 INFO Executor: Running task 6.0 in stage 1.0 (TID 6) 14/07/15 19:44:40 INFO Executor: Running task 7.0 in stage 1.0 (TID 7) 14/07/15 19:44:40 INFO Executor: Finished task 3.0 in stage 1.0 (TID 3). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 2.0 in stage 1.0 (TID 2). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 5.0 in stage 1.0 (TID 5). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 4.0 in stage 1.0 (TID 4). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 6.0 in stage 1.0 (TID 6). 847 bytes result sent to driver 14/07/15 19:44:40 INFO Executor: Finished task 7.0 in stage 1.0 (TID 7). 847 bytes result sent to driver ``` Author: Reynold Xin <rxin@apache.org> Closes #1259 from rxin/betterTaskLogging and squashes the following commits: c28ada1 [Reynold Xin] Fix unit test failure. 987d043 [Reynold Xin] Updated log messages. c6cfd46 [Reynold Xin] Merge branch 'master' into betterTaskLogging b7b1bcc [Reynold Xin] Fixed a typo. f9aba3c [Reynold Xin] Made it compile. f8a5c06 [Reynold Xin] Merge branch 'master' into betterTaskLogging 07264e6 [Reynold Xin] Defensive check against unknown TaskEndReason. 76bbd18 [Reynold Xin] FailureSuite not serializable reporting. 4659b20 [Reynold Xin] Remove unused variable. 53888e3 [Reynold Xin] [SPARK-2317] Improve task logging.
* [SPARK-2522] set default broadcast factory to torrentXiangrui Meng2014-07-161-1/+1
| | | | | | | | | | HttpBroadcastFactory is the current default broadcast factory. It sends the broadcast data to each worker one by one, which is slow when the cluster is big. TorrentBroadcastFactory scales much better than http. Maybe we should make torrent the default broadcast method. Author: Xiangrui Meng <meng@databricks.com> Closes #1437 from mengxr/bt-broadcast and squashes the following commits: ed492fe [Xiangrui Meng] set default broadcast factory to torrent
* [SPARK-2517] Remove some compiler warnings.Reynold Xin2014-07-164-23/+21
| | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #1433 from rxin/compile-warning and squashes the following commits: 8d0b890 [Reynold Xin] Remove some compiler warnings.
* SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical...Sandy Ryza2014-07-162-9/+11
| | | | | | | | | | ... aggregation code Author: Sandy Ryza <sandy@cloudera.com> Closes #1435 from sryza/sandy-spark-2519 and squashes the following commits: 640706a [Sandy Ryza] SPARK-2519. Eliminate pattern-matching on Tuple2 in performance-critical aggregation code
* Tightening visibility for various Broadcast related classes.Reynold Xin2014-07-165-35/+36
| | | | | | | | | | In preparation for SPARK-2521. Author: Reynold Xin <rxin@apache.org> Closes #1438 from rxin/broadcast and squashes the following commits: 432f1cc [Reynold Xin] Tightening visibility for various Broadcast related classes.
* SPARK-2277: make TaskScheduler track hosts on rackRui Li2014-07-163-5/+83
| | | | | | | | | | | | | | Hi mateiz, I've created [SPARK-2277](https://issues.apache.org/jira/browse/SPARK-2277) to make TaskScheduler track hosts on each rack. Please help to review, thanks. Author: Rui Li <rui.li@intel.com> Closes #1212 from lirui-intel/trackHostOnRack and squashes the following commits: 2b4bd0f [Rui Li] SPARK-2277: refine UT fbde838 [Rui Li] SPARK-2277: add UT 7bbe658 [Rui Li] SPARK-2277: rename the method 5e4ef62 [Rui Li] SPARK-2277: remove unnecessary import 79ac750 [Rui Li] SPARK-2277: make TaskScheduler track hosts on rack
* [SPARK-2500] Move the logInfo for registering BlockManager to ↵Henry Saputra2014-07-151-3/+4
| | | | | | | | | | | | | | | | BlockManagerMasterActor.register method PR for SPARK-2500 Move the logInfo call for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor. Previously the loginfo call for registering the registering a BlockManager is happening in the BlockManagerInfo constructor. This kind of confusing because the code could call "new BlockManagerInfo" without actually registering a BlockManager and could confuse when reading the log files. Author: Henry Saputra <henry.saputra@gmail.com> Closes #1424 from hsaputra/move_registerblockmanager_log_to_registration_method and squashes the following commits: 3370b4a [Henry Saputra] Move the loginfo for BlockManager to BlockManagerMasterActor.register instead of BlockManagerInfo constructor.
* [SPARK-2469] Use Snappy (instead of LZF) for default shuffle compression codecReynold Xin2014-07-152-3/+3
| | | | | | | | | | This reduces shuffle compression memory usage by 3x. Author: Reynold Xin <rxin@apache.org> Closes #1415 from rxin/snappy and squashes the following commits: 06c1a01 [Reynold Xin] SPARK-2469: Use Snappy (instead of LZF) for default shuffle compression codec.
* SPARK-1291: Link the spark UI to RM ui in yarn-client modewitgo2014-07-153-1/+31
| | | | | | | | | | | | | | | Author: witgo <witgo@qq.com> Closes #1112 from witgo/SPARK-1291 and squashes the following commits: 6022bcd [witgo] review commit 1fbb925 [witgo] add addAmIpFilter to yarn alpha 210299c [witgo] review commit 1b92a07 [witgo] review commit 6896586 [witgo] Add comments to addWebUIFilter 3e9630b [witgo] review commit 142ee29 [witgo] review commit 1fe7710 [witgo] Link the spark UI to RM ui in yarn-client mode
* Reformat multi-line closure argument.William Benton2014-07-151-2/+3
| | | | | | | | Author: William Benton <willb@redhat.com> Closes #1419 from willb/reformat-2486 and squashes the following commits: 2676231 [William Benton] Reformat multi-line closure argument.
* [SPARK-2399] Add support for LZ4 compression.Reynold Xin2014-07-153-0/+32
| | | | | | | | | | | Based on Greg Bowyer's patch from JIRA https://issues.apache.org/jira/browse/SPARK-2399 Author: Reynold Xin <rxin@apache.org> Closes #1416 from rxin/lz4 and squashes the following commits: 6c8fefe [Reynold Xin] Fixed typo. 8a14d38 [Reynold Xin] [SPARK-2399] Add support for LZ4 compression.
* discarded exceeded completedDriverslianhuiwang2014-07-151-0/+5
| | | | | | | | | | When completedDrivers number exceeds the threshold, the first Max(spark.deploy.retainedDrivers, 1) will be discarded. Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #1114 from lianhuiwang/retained-drivers and squashes the following commits: 8789418 [lianhuiwang] discarded exceeded completedDrivers
* [SPARK-2390] Files in staging directory cannot be deleted and wastes the ↵Kousuke Saruta2014-07-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | space of HDFS When running jobs with YARN Cluster mode and using HistoryServer, the files in the Staging Directory (~/.sparkStaging on HDFS) cannot be deleted. HistoryServer uses directory where event log is written, and the directory is represented as a instance of o.a.h.f.FileSystem created by using FileSystem.get. On the other hand, ApplicationMaster has a instance named fs, which also created by using FileSystem.get. FileSystem.get returns cached same instance when URI passed to the method represents same file system and the method is called by same user. Because of the behavior, when the directory for event log is on HDFS, fs of ApplicationMaster and fileSystem of FileLogger is same instance. When shutting down ApplicationMaster, fileSystem.close is called in FileLogger#stop, which is invoked by SparkContext#stop indirectly. And ApplicationMaster#cleanupStagingDir also called by JVM shutdown hook. In this method, fs.delete(stagingDirPath) is invoked. Because fs.delete in ApplicationMaster is called after fileSystem.close in FileLogger, fs.delete fails and results not deleting files in the staging directory. I think, calling fileSystem.delete is not needed. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #1326 from sarutak/SPARK-2390 and squashes the following commits: 10e1a88 [Kousuke Saruta] Removed fileSystem.close from FileLogger.scala not to prevent any other FileSystem operation
* Add/increase severity of warning in documentation of groupBy()Aaron Davidson2014-07-142-9/+21
| | | | | | | | | | | | | | | groupBy()/groupByKey() is notorious for being a very convenient API that can lead to poor performance when used incorrectly. This PR just makes it clear that users should be cautious not to rely on this API when they really want a different (more performant) one, such as reduceByKey(). (Note that one source of confusion is the name; this groupBy() is not the same as a SQL GROUP-BY, which is used for aggregation and is more similar in nature to Spark's reduceByKey().) Author: Aaron Davidson <aaron@databricks.com> Closes #1380 from aarondav/warning and squashes the following commits: f60da39 [Aaron Davidson] Give better advice d0afb68 [Aaron Davidson] Add/increase severity of warning in documentation of groupBy()
* SPARK-2486: Utils.getCallSite is now resilient to bogus framesWilliam Benton2014-07-141-1/+5
| | | | | | | | | | | | | | When running Spark under certain instrumenting profilers, Utils.getCallSite could crash with an NPE. This commit makes it more resilient to failures occurring while inspecting stack frames. Author: William Benton <willb@redhat.com> Closes #1413 from willb/spark-2486 and squashes the following commits: b7c0274 [William Benton] Use explicit null checks instead of Try() 0f0c1ae [William Benton] Utils.getCallSite is now resilient to bogus frames
* [SPARK-1946] Submit tasks after (configured ratio) executors have been ↵li-zhihui2014-07-145-1/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | registered Because submitting tasks and registering executors are asynchronous, in most situation, early stages' tasks run without preferred locality. A simple solution is sleeping few seconds in application, so that executors have enough time to register. The PR add 2 configuration properties to make TaskScheduler submit tasks after a few of executors have been registered. \# Submit tasks only after (registered executors / total executors) arrived the ratio, default value is 0 spark.scheduler.minRegisteredExecutorsRatio = 0.8 \# Whatever minRegisteredExecutorsRatio is arrived, submit tasks after the maxRegisteredWaitingTime(millisecond), default value is 30000 spark.scheduler.maxRegisteredExecutorsWaitingTime = 5000 Author: li-zhihui <zhihui.li@intel.com> Closes #900 from li-zhihui/master and squashes the following commits: b9f8326 [li-zhihui] Add logs & edit docs 1ac08b1 [li-zhihui] Add new configs to user docs 22ead12 [li-zhihui] Move waitBackendReady to postStartHook c6f0522 [li-zhihui] Bug fix: numExecutors wasn't set & use constant DEFAULT_NUMBER_EXECUTORS 4d6d847 [li-zhihui] Move waitBackendReady to TaskSchedulerImpl.start & some code refactor 0ecee9a [li-zhihui] Move waitBackendReady from DAGScheduler.submitStage to TaskSchedulerImpl.submitTasks 4261454 [li-zhihui] Add docs for new configs & code style ce0868a [li-zhihui] Code style, rename configuration property name of minRegisteredRatio & maxRegisteredWaitingTime 6cfb9ec [li-zhihui] Code style, revert default minRegisteredRatio of yarn to 0, driver get --num-executors in yarn/alpha 812c33c [li-zhihui] Fix driver lost --num-executors option in yarn-cluster mode e7b6272 [li-zhihui] support yarn-cluster 37f7dc2 [li-zhihui] support yarn mode(percentage style) 3f8c941 [li-zhihui] submit stage after (configured ratio of) executors have been registered
* move some test file to match src codeDaoyuan2014-07-145-25/+19
| | | | | | | | | | Just move some test suite to corresponding package Author: Daoyuan <daoyuan.wang@intel.com> Closes #1401 from adrian-wang/movetestfiles and squashes the following commits: d1a6803 [Daoyuan] move some test file to match src code
* Use the Executor's ClassLoader in sc.objectFile().Daniel Darabos2014-07-122-2/+27
| | | | | | | | | | | | | | | | | | | This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or may not be the right one.) I created this because I ran into the following problem. I have x:RDD[X] with X being defined in the jar that I provide to SparkContext. I save it with x.saveAsObjectFile("x"). I try to load it with sc.objectFile\[X\]("x"). It fails with ClassNotFoundException. After a good while of debugging I figured out that Utils.deserialize() most likely uses the ClassLoader of Utils. This is the bootstrap ClassLoader, so it is not aware of the dynamically added jars. This patch fixes the issue. A more robust fix would be to always default to Thread.currentThread.getContextClassLoader. This would prevent this problem from biting anyone in the future. It would be a bit harder to test though. On the topic of testing, if you'd like to see tests for this, I will need some hand-holding. Thanks! Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #181 from darabos/master and squashes the following commits: 45a011a [Daniel Darabos] Add test for SPARK-1877. (Fixed in 52eb54d.) e13e090 [Daniel Darabos] Merge branch 'master' of https://github.com/apache/spark 61fe0d0 [Daniel Darabos] Fix style (line too long). 1b5df2c [Daniel Darabos] Use the Executor's ClassLoader in sc.objectFile(). This makes it possible to read classes from the object file which were specified in the user-provided jars. (By default ObjectInputStream uses latestUserDefinedLoader, which may or may not be the right one.)
* [Minor] Remove unused val in MasterAndrew Or2014-07-111-3/+0
| | | | | | | | | Author: Andrew Or <andrewor14@gmail.com> Closes #1365 from andrewor14/master-fs and squashes the following commits: 497f100 [Andrew Or] Sneak in a space and hope no one will notice 05ba6da [Andrew Or] Remove unused val
* [SPARK-1776] Have Spark's SBT build read dependencies from Maven.Prashant Sharma2014-07-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch introduces the new way of working also retaining the existing ways of doing things. For example build instruction for yarn in maven is `mvn -Pyarn -PHadoop2.2 clean package -DskipTests` in sbt it can become `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly` Also supports `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly` Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #772 from ScrapCodes/sbt-maven and squashes the following commits: a8ac951 [Prashant Sharma] Updated sbt version. 62b09bb [Prashant Sharma] Improvements. fa6221d [Prashant Sharma] Excluding sql from mima 4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default. 72651ca [Prashant Sharma] Addresses code reivew comments. acab73d [Prashant Sharma] Revert "Small fix to run-examples script." ac4312c [Prashant Sharma] Revert "minor fix" 6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit. 65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path. 446768e [Prashant Sharma] minor fix 89b9777 [Prashant Sharma] Merge conflicts d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups. dccc8ac [Prashant Sharma] updated mima to check against 1.0 a49c61b [Prashant Sharma] Fix for tools jar a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies. cf88758 [Prashant Sharma] cleanup 9439ea3 [Prashant Sharma] Small fix to run-examples script. 96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven. 36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins. 4973dbd [Patrick Wendell] Example build using pom reader.
* SPARK-2115: Stage kill link is too close to stage details linkMasayoshi TSUZUKI2014-07-102-2/+5
| | | | | | | | | | Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #1350 from tsudukim/feature/SPARK-2115 and squashes the following commits: e2263b0 [Masayoshi TSUZUKI] Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
* [SPARK-2384] Add tooltips to UI.Kay Ousterhout2014-07-0810-93/+489
| | | | | | | | | | | | | | | | | | | | This patch adds tooltips to clarify some points of confusion in the UI. When users mouse over some of the table headers (shuffle read, write, and input size) as well as over the "scheduler delay" metric shown for each stage, a black tool tip (see image below) pops up describing the metric in more detail. After the tooltip mechanism is added by this commit, I imagine others may want to add more tooltips for other things in the UI, but I think this is a good starting point. ![tooltip](https://cloud.githubusercontent.com/assets/1108612/3491905/994e179e-059f-11e4-92f2-c6c12d248d81.jpg) This looks scary-big but much of it is adding the bootstrap tool tip JavaScript. Also I have no idea what to put for the license in tooltip (I left it the same -- the Twitter apache header) or for JQuery (left it as nothing) -- @mateiz what's the right thing here? cc @pwendell @andrewor14 @rxin Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #1314 from kayousterhout/tooltips and squashes the following commits: 19981b5 [Kay Ousterhout] Exclude non-licensed javascript files from style check d9ab5a9 [Kay Ousterhout] Response to Andrew's review 7752449 [Kay Ousterhout] [SPARK-2384] Add tooltips to UI.
* [SPARK-2392] Executors should not start their own HTTP serversAndrew Or2014-07-081-4/+10
| | | | | | | | | | | | Executors currently start their own unused HTTP file servers. This is because we use the same SparkEnv class for both executors and drivers, and we do not distinguish this case. In the longer term, we should separate out SparkEnv for the driver and SparkEnv for the executors. Author: Andrew Or <andrewor14@gmail.com> Closes #1335 from andrewor14/executor-http-server and squashes the following commits: 46ef263 [Andrew Or] Start HTTP server only on the driver
* [SPARK-2403] Catch all errors during serialization in DAGSchedulerDaniel Darabos2014-07-081-0/+5
| | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-2403 Spark hangs for us whenever we forget to register a class with Kryo. This should be a simple fix for that. But let me know if you have a better suggestion. I did not write a new test for this. It would be pretty complicated and I'm not sure it's worthwhile for such a simple change. Let me know if you disagree. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #1329 from darabos/spark-2403 and squashes the following commits: 3aceaad [Daniel Darabos] Print full stack trace for miscellaneous exceptions during serialization. 52c22ba [Daniel Darabos] Only catch NonFatal exceptions. 361e962 [Daniel Darabos] Catch all errors during serialization in DAGScheduler.
* Resolve sbt warnings during build Ⅱwitgo2014-07-083-57/+57
| | | | | | | | | Author: witgo <witgo@qq.com> Closes #1153 from witgo/expectResult and squashes the following commits: 97541d8 [witgo] merge master ead26e7 [witgo] Resolve sbt warnings during build
* [SPARK-2306]:BoundedPriorityQueue is private and not registered with Kry...ankit.bhardwaj2014-07-041-1/+3
| | | | | | | | | | | Due to the non registration of BoundedPriorityQueue with kryoserializer, operations which are dependend on BoundedPriorityQueue are giving exceptions.One such instance is using top along with kryo serialization. Fixed the issue by registering BoundedPriorityQueue with kryoserializer. Author: ankit.bhardwaj <ankit.bhardwaj@guavus.com> Closes #1299 from AnkitBhardwaj12/BoundedPriorityQueueWithKryoIssue and squashes the following commits: a4ae8ed [ankit.bhardwaj] [SPARK-2306]:BoundedPriorityQueue is private and not registered with Kryo
* Added SignalLogger to HistoryServer.Reynold Xin2014-07-041-2/+3
| | | | | | | | | | This was omitted in #1260. @aarondav Author: Reynold Xin <rxin@apache.org> Closes #1300 from rxin/historyServer and squashes the following commits: af720a3 [Reynold Xin] Added SignalLogger to HistoryServer.
* SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing SparkAaron Davidson2014-07-031-0/+2
| | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-2282 This issue is caused by a buildup of sockets in the TIME_WAIT stage of TCP, which is a stage that lasts for some period of time after the communication closes. This solution simply allows us to reuse sockets that are in TIME_WAIT, to avoid issues with the buildup of the rapid creation of these sockets. Author: Aaron Davidson <aaron@databricks.com> Closes #1220 from aarondav/SPARK-2282 and squashes the following commits: 2e5cab3 [Aaron Davidson] SPARK-2282: Reuse PySpark Accumulator sockets to avoid crashing Spark
* [SPARK-2307][Reprise] Correctly report RDD blocks on SparkUIAndrew Or2014-07-036-23/+184
| | | | | | | | | | | | | | | | | | | | | **Problem.** The existing code in `ExecutorPage.scala` requires a linear scan through all the blocks to filter out the uncached ones. Every refresh could be expensive if there are many blocks and many executors. **Solution.** The proper semantics should be the following: `StorageStatusListener` should contain only block statuses that are cached. This means as soon as a block is unpersisted by any mean, its status should be removed. This is reflected in the changes made in `StorageStatusListener.scala`. Further, the `StorageTab` must stop relying on the `StorageStatusListener` changing a dropped block's status to `StorageLevel.NONE` (which no longer happens). This is reflected in the changes made in `StorageTab.scala` and `StorageUtils.scala`. ---------- If you have been following this chain of PRs like pwendell, you will quickly notice that this reverts the changes in #1249, which reverts the changes in #1080. In other words, we are adding back the changes from #1080, and fixing SPARK-2307 on top of those changes. Please ask questions if you are confused. Author: Andrew Or <andrewor14@gmail.com> Closes #1255 from andrewor14/storage-ui-fix-reprise and squashes the following commits: 45416fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into storage-ui-fix-reprise a82ea25 [Andrew Or] Add tests for StorageStatusListener 8773b01 [Andrew Or] Update comment / minor changes 3afde3f [Andrew Or] Correctly report the number of blocks on SparkUI
* [SPARK-2350] Don't NPE while launching driversAaron Davidson2014-07-031-1/+1
| | | | | | | | | | Prior to this change, we could throw a NPE if we launch a driver while another one is waiting, because removing from an iterator while iterating over it is not safe. Author: Aaron Davidson <aaron@databricks.com> Closes #1289 from aarondav/master-fail and squashes the following commits: 1cf1cf4 [Aaron Davidson] SPARK-2350: Don't NPE while launching drivers
* [SPARK-1097] Workaround Hadoop conf ConcurrentModification issueRaymond Liu2014-07-031-2/+2
| | | | | | | | | | | Workaround Hadoop conf ConcurrentModification issue Author: Raymond Liu <raymond.liu@intel.com> Closes #1273 from colorant/hadoopRDD and squashes the following commits: 994e98b [Raymond Liu] Address comments e2cda3d [Raymond Liu] Workaround Hadoop conf ConcurrentModification issue