aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-7729][UI] Executor which has been killed should also be displayed on ↵Lianhui Wang2016-02-2311-44/+98
| | | | | | | | | | | | | | | Executor Tab andrewor14 squito Dead Executors should also be displayed on Executor Tab. as following: ![image](https://cloud.githubusercontent.com/assets/545478/11492707/ae55d7f6-982b-11e5-919a-b62cd84684b2.png) Author: Lianhui Wang <lianhuiwang09@gmail.com> This patch had conflicts when merged, resolved by Committer: Andrew Or <andrew@databricks.com> Closes #10058 from lianhuiwang/SPARK-7729.
* [SPARK-13364] Sort appId as num rather than str in history page.zhuol2016-02-232-2/+33
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? History page now sorts the appID as a string, which can lead to unexpected order for the case "application_11111_9" and "application_11111_20". Add a new sort type called appId-numeric can fix it. ## How was the this patch tested? This patch was manually tested with UI. See the screenshot below: ![sortappidbetter](https://cloud.githubusercontent.com/assets/11683054/13185564/7f941a16-d707-11e5-8fb7-0316368d3030.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #11259 from zhuoliu/13364.
* [SPARK-13358] [SQL] Retrieve grep path when do benchmarkLiang-Chi Hsieh2016-02-231-1/+5
| | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-13358 When trying to run a benchmark, I found that on my Ubuntu linux grep is not in /usr/bin/ but /bin/. So wondering if it is better to use which to retrieve grep path. cc davies Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11231 from viirya/benchmark-grep-path.
* [SPARK-13220][CORE] deprecate yarn-client and yarn-cluster modejerryshao2016-02-235-43/+68
| | | | | | Author: jerryshao <sshao@hortonworks.com> Closes #11229 from jerryshao/SPARK-13220.
* [SPARK-13298][CORE][UI] Escape "label" to avoid DAG being broken by some ↵Shixiong Zhu2016-02-221-3/+4
| | | | | | | | | | | | | | | | special character ## What changes were proposed in this pull request? When there are some special characters (e.g., `"`, `\`) in `label`, DAG will be broken. This patch just escapes `label` to avoid DAG being broken by some special characters ## How was the this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11309 from zsxwing/SPARK-13298.
* [SPARK-13413] Remove SparkContext.metricsSystemReynold Xin2016-02-221-7/+2
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch removes SparkContext.metricsSystem. SparkContext.metricsSystem returns MetricsSystem, which is a private class. I think it was added by accident. In addition, I also removed an unused private[spark] method schedulerBackend setter. ## How was the this patch tested? N/A. Author: Reynold Xin <rxin@databricks.com> This patch had conflicts when merged, resolved by Committer: Josh Rosen <joshrosen@databricks.com> Closes #11282 from rxin/SPARK-13413.
* [SPARK-10749][MESOS] Support multiple roles with mesos cluster mode.Timothy Chen2016-02-223-98/+170
| | | | | | | | | | Currently the Mesos cluster dispatcher is not using offers from multiple roles correctly, as it simply aggregates all the offers resource values into one, but doesn't apply them correctly before calling the driver as Mesos needs the resources from the offers to be specified which role it originally belongs to. Multiple roles is already supported with fine/coarse grain scheduler, so porting that logic here to the cluster scheduler. https://issues.apache.org/jira/browse/SPARK-10749 Author: Timothy Chen <tnachen@gmail.com> Closes #8872 from tnachen/cluster_multi_roles.
* [MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵Dongjoon Hyun2016-02-224-6/+6
| | | | | | | | | | | | | | | | | in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
* [SPARK-13426][CORE] Remove the support of SIMRjerryshao2016-02-223-92/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR removes the support of SIMR, since SIMR is not actively used and maintained for a long time, also is not supported from `SparkSubmit`, so here propose to remove it. ## How was the this patch tested? This patch is tested locally by running unit tests. Author: jerryshao <sshao@hortonworks.com> Closes #11296 from jerryshao/SPARK-13426.
* [SPARK-13408] [CORE] Ignore errors when it's already reported in JobWaiterShixiong Zhu2016-02-192-3/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? `JobWaiter.taskSucceeded` will be called for each task. When `resultHandler` throws an exception, `taskSucceeded` will also throw it for each task. DAGScheduler just catches it and reports it like this: ```Scala try { job.listener.taskSucceeded(rt.outputId, event.result) } catch { case e: Exception => // TODO: Perhaps we want to mark the resultStage as failed? job.listener.jobFailed(new SparkDriverExecutionException(e)) } ``` Therefore `JobWaiter.jobFailed` may be called multiple times. So `JobWaiter.jobFailed` should use `Promise.tryFailure` instead of `Promise.failure` because the latter one doesn't support calling multiple times. ## How was the this patch tested? Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11280 from zsxwing/SPARK-13408.
* [SPARK-13407] Guard against garbage-collected accumulators in ↵Josh Rosen2016-02-192-32/+33
| | | | | | | | | | TaskMetrics.fromAccumulatorUpdates `TaskMetrics.fromAccumulatorUpdates()` can fail if accumulators have been garbage-collected on the driver. To guard against this, this patch introduces `ListenerTaskMetrics`, a subclass of `TaskMetrics` which is used only in `TaskMetrics.fromAccumulatorUpdates()` and which eliminates the need to access the original accumulators on the driver. Author: Josh Rosen <joshrosen@databricks.com> Closes #11276 from JoshRosen/accum-updates-fix.
* [SPARK-13339][DOCS] Clarify commutative / associative operator requirements ↵Sean Owen2016-02-195-33/+33
| | | | | | | | | | | | for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.
* [SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares ↵Sean Owen2016-02-185-7/+11
| | | | | | | | | | | | | | | | Option and String directly. ## What changes were proposed in this pull request? Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager) ## How was the this patch tested? Running Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11253 from srowen/SPARK-13371.
* [SPARK-13344][TEST] Fix harmless accumulator not found exceptionsAndrew Or2016-02-173-4/+30
| | | | | | | | See [JIRA](https://issues.apache.org/jira/browse/SPARK-13344) for more detail. This was caused by #10835. Author: Andrew Or <andrew@databricks.com> Closes #11222 from andrewor14/fix-test-accum-exceptions.
* [SPARK-13279] Remove O(n^2) operation from scheduler.Sital Kedia2016-02-161-15/+13
| | | | | | | | | This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11175 from sitalkedia/fix_stuck_driver.
* [SPARK-13278][CORE] Launcher fails to start with JDK 9 EAClaes Redestad2016-02-141-2/+4
| | | | | | | | See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme. Author: Claes Redestad <claes.redestad@gmail.com> Closes #11160 from cl4es/master.
* [SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecatedSean Owen2016-02-133-6/+6
| | | | | | | | Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.
* [SPARK-13142][WEB UI] Problem accessing Web UI /logPage/ on Microsoft Windowsmarkpavey2016-02-131-2/+2
| | | | | | | | | | | | Due to being on a Windows platform I have been unable to run the tests as described in the "Contributing to Spark" instructions. As the change is only to two lines of code in the Web UI, which I have manually built and tested, I am submitting this pull request anyway. I hope this is OK. Is it worth considering also including this fix in any future 1.5.x releases (if any)? I confirm this is my own original work and license it to the Spark project under its open source license. Author: markpavey <mark.pavey@thefilter.com> Closes #11135 from markpavey/JIRA_SPARK-13142_WindowsWebUILogFix.
* [SPARK-5095] remove flaky testMichael Gummelt2016-02-121-0/+5
| | | | | | | | Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11164 from mgummelt/fix_mesos_tests.
* [SPARK-5095] Fix style in mesos coarse grained scheduler codeMichael Gummelt2016-02-122-10/+12
| | | | | | | | andrewor14 This addressed your style comments from #10993 Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11187 from mgummelt/fix_mesos_style.
* [SPARK-6166] Limit number of in flight outbound requestsSanket2016-02-114-15/+39
| | | | | | | | | | | This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.
* [SPARK-7889][WEBUI] HistoryServer updates UI for incomplete appsSteve Loughran2016-02-118-59/+1596
| | | | | | | | | | | When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger *filesize*. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.
* Revert "[SPARK-13279] Remove O(n^2) operation from scheduler."Reynold Xin2016-02-111-9/+6
| | | | This reverts commit 50fa6fd1b365d5db7e2b2c59624a365cef0d1696.
* [SPARK-13279] Remove O(n^2) operation from scheduler.Sital Kedia2016-02-111-6/+9
| | | | | | | | | | | This commit removes an unnecessary duplicate check in addPendingTask that meant that scheduling a task set took time proportional to (# tasks)^2. Author: Sital Kedia <skedia@fb.com> Closes #11167 from sitalkedia/fix_stuck_driver and squashes the following commits: 3fe1af8 [Sital Kedia] [SPARK-13279] Remove unnecessary duplicate check in addPendingTask function
* [SPARK-13124][WEB UI] Fixed CSS and JS issues caused by addition of JQuery ↵Alex Bozarth2016-02-113-14/+20
| | | | | | | | | | DataTables Made sure the old tables continue to use the old css and the new DataTables use the new css. Also fixed it so the Safari Web Inspector doesn't throw errors when on the new DataTables pages. Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11038 from ajbozarth/spark13124.
* [SPARK-13074][CORE] Add JavaSparkContext. getPersistentRDDs methodJunyang2016-02-112-0/+22
| | | | | | | | | | The "getPersistentRDDs()" is a useful API of SparkContext to get cached RDDs. However, the JavaSparkContext does not have this API. Add a simple getPersistentRDDs() to get java.util.Map<Integer, JavaRDD> for Java users. Author: Junyang <fly.shenjy@gmail.com> Closes #10978 from flyjy/master.
* [SPARK-12414][CORE] Remove closure serializerSean Owen2016-02-102-5/+3
| | | | | | | | | | Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.
* [SPARK-13126] fix the right margin of history page.zhuol2016-02-101-1/+1
| | | | | | | | The right margin of the history page is little bit off. A simple fix for that issue. Author: zhuol <zhuol@yahoo-inc.com> Closes #11029 from zhuoliu/13126.
* [SPARK-13163][WEB UI] Column width on new History Server DataTables not ↵Alex Bozarth2016-02-101-0/+1
| | | | | | | | | | getting set correctly The column width for the new DataTables now adjusts for the current page rather than being hard-coded for the entire table's data. Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #11057 from ajbozarth/spark13163.
* [SPARK-5095][MESOS] Support launching multiple mesos executors in coarse ↵Michael Gummelt2016-02-107-267/+506
| | | | | | | | | | | | | | | | | | | | grained mesos mode. This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027 In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone. This PR implements that resolution. This PR implements two high-level features. These two features are co-dependent, so they're implemented both here: - Mesos support for spark.executor.cores - Multiple executors per slave We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR. The contribution is my original work and I license the work to the project under the project's open source license. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10993 from mgummelt/executor_sizing.
* [SPARK-9307][CORE][SPARK] Logging: Make it either stable or privateSean Owen2016-02-101-6/+2
| | | | | | | | Make Logging private[spark]. Pretty much all there is to it. Author: Sean Owen <sowen@cloudera.com> Closes #11103 from srowen/SPARK-9307.
* [SPARK-12950] [SQL] Improve lookup of BytesToBytesMap in aggregateDavies Liu2016-02-092-76/+96
| | | | | | | | | | | | This PR improve the lookup of BytesToBytesMap by: 1. Generate code for calculate the hash code of grouping keys. 2. Do not use MemoryLocation, fetch the baseObject and offset for key and value directly (remove the indirection). Author: Davies Liu <davies@databricks.com> Closes #11010 from davies/gen_map.
* [SPARK-13245][CORE] Call shuffleMetrics methods only in one thread for ↵Shixiong Zhu2016-02-091-11/+27
| | | | | | | | | | | | ShuffleBlockFetcherIterator Call shuffleMetrics's incRemoteBytesRead and incRemoteBlocksFetched when polling FetchResult from `results` so as to always use shuffleMetrics in one thread. Also fix a race condition that could cause memory leak. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11138 from zsxwing/SPARK-13245.
* [SPARK-12888] [SQL] [FOLLOW-UP] benchmark the new hash expressionWenchen Fan2016-02-091-2/+2
| | | | | | | | | | | | | | | | Adds the benchmark results as comments. The codegen version is slower than the interpreted version for `simple` case becasue of 3 reasons: 1. codegen version use a more complex hash algorithm than interpreted version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153). 2. codegen version will write the hash value to a row first and then read it out. I tried to create a `GenerateHasher` that can generate code to return hash value directly and got about 60% speed up for the `simple` case, does it worth? 3. the row in `simple` case only has one int field, so the runtime reflection may be removed because of branch prediction, which makes the interpreted version faster. The `array` case is also slow for similar reasons, e.g. array elements are of same type, so interpreted version can probably get rid of runtime reflection by branch prediction. Author: Wenchen Fan <wenchen@databricks.com> Closes #10917 from cloud-fan/hash-benchmark.
* [SPARK-13176][CORE] Use native file linking instead of external process lnJakob Odersky2016-02-091-19/+8
| | | | | | | | Since Spark requires at least JRE 1.7, it is safe to use built-in java.nio.Files. Author: Jakob Odersky <jakob@odersky.com> Closes #11098 from jodersky/SPARK-13176.
* [SPARK-10620][SPARK-13054] Minor addendum to #10835Andrew Or2016-02-0816-48/+64
| | | | | | | | Additional changes to #10835, mainly related to style and visibility. This patch also adds back a few deprecated methods for backward compatibility. Author: Andrew Or <andrew@databricks.com> Closes #10958 from andrewor14/task-metrics-to-accums-followups.
* [SPARK-13210][SQL] catch OOM when allocate memory and expand arrayDavies Liu2016-02-087-21/+35
| | | | | | | | | | | | There is a bug when we try to grow the buffer, OOM is ignore wrongly (the assert also skipped by JVM), then we try grow the array again, this one will trigger spilling free the current page, the current record we inserted will be invalid. The root cause is that JVM has less free memory than MemoryManager thought, it will OOM when allocate a page without trigger spilling. We should catch the OOM, and acquire memory again to trigger spilling. And also, we could not grow the array in `insertRecord` of `InMemorySorter` (it was there just for easy testing). Author: Davies Liu <davies@databricks.com> Closes #11095 from davies/fix_expand.
* [SPARK-5865][API DOC] Add doc warnings for methods that return local data ↵Tommy YU2016-02-064-0/+45
| | | | | | | | | | | | | structures rxin srowen I work out note message for rdd.take function, please help to review. If it's fine, I can apply to all other function later. Author: Tommy YU <tummyyu@163.com> Closes #10874 from Wenpei/spark-5865-add-warning-for-localdatastructure.
* [HOTFIX] fix float part of avgRateDavies Liu2016-02-051-1/+1
|
* [SPARK-13171][CORE] Replace future calls with FutureJakob Odersky2016-02-054-17/+17
| | | | | | | | | Trivial search-and-replace to eliminate deprecation warnings in Scala 2.11. Also works with 2.10 Author: Jakob Odersky <jakob@odersky.com> Closes #11085 from jodersky/SPARK-13171.
* [SPARK-13002][MESOS] Send initial request of executors for dyn allocationLuc Bourlier2016-02-051-3/+10
| | | | | | | | | | | | | | | | | Fix for [SPARK-13002](https://issues.apache.org/jira/browse/SPARK-13002) about the initial number of executors when running with dynamic allocation on Mesos. Instead of fixing it just for the Mesos case, made the change in `ExecutorAllocationManager`. It is already driving the number of executors running on Mesos, only no the initial value. The `None` and `Some(0)` are internal details on the computation of resources to reserved, in the Mesos backend scheduler. `executorLimitOption` has to be initialized correctly, otherwise the Mesos backend scheduler will, either, create to many executors at launch, or not create any executors and not be able to recover from this state. Removed the 'special case' description in the doc. It was not totally accurate, and is not needed anymore. This doesn't fix the same problem visible with Spark standalone. There is no straightforward way to send the initial value in standalone mode. Somebody knowing this part of the yarn support should review this change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #11047 from skyluc/issue/initial-dyn-alloc-2.
* [SPARK-13208][CORE] Replace use of Pairs with Tuple2sJakob Odersky2016-02-042-3/+3
| | | | | | | | Another trivial deprecation fix for Scala 2.11 Author: Jakob Odersky <jakob@odersky.com> Closes #11089 from jodersky/SPARK-13208.
* [SPARK-13052] waitingApps metric doesn't show the number of apps currently ↵Raafat Akkad2016-02-042-2/+2
| | | | | | | | in the WAITING state Author: Raafat Akkad <raafat.akkad@gmail.com> Closes #10959 from RaafatAkkad/master.
* [HOTFIX] Fix style violation caused by c756bdaAndrew Or2016-02-041-2/+3
|
* [SPARK-12330][MESOS][HOTFIX] Rename timeout configAndrew Or2016-02-041-2/+2
| | | | | | The config already describes time and accepts a general format that is not restricted to ms. This commit renames the internal config to use a format that's consistent in Spark.
* [SPARK-13053][TEST] Unignore tests in InternalAccumulatorSuiteAndrew Or2016-02-042-78/+102
| | | | | | | | | | These were ignored because they are incorrectly written; they don't actually trigger stage retries, which is what the tests are testing. These tests are now rewritten to induce stage retries through fetch failures. Note: there were 2 tests before and now there's only 1. What happened? It turns out that the case where we only resubmit a subset of of the original missing partitions is very difficult to simulate in tests without potentially introducing flakiness. This is because the `DAGScheduler` removes all map outputs associated with a given executor when this happens, and we will need multiple executors to trigger this case, and sometimes the scheduler still removes map outputs from all executors. Author: Andrew Or <andrew@databricks.com> Closes #10969 from andrewor14/unignore-accum-test.
* [SPARK-13162] Standalone mode does not respect initial executorsAndrew Or2016-02-045-6/+34
| | | | | | | | Currently the Master would always set an application's initial executor limit to infinity. If the user specified `spark.dynamicAllocation.initialExecutors`, the config would not take effect. This is similar to #11047 but for standalone mode. Author: Andrew Or <andrew@databricks.com> Closes #11054 from andrewor14/standalone-da-initial.
* [SPARK-13164][CORE] Replace deprecated synchronized buffer in coreHolden Karau2016-02-044-39/+40
| | | | | | | | Building with scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative. Investigation shows we are already using ConcurrentLinkedQueue in other locations so switch our uses of SynchronizedBuffer to ConcurrentLinkedQueue. Author: Holden Karau <holden@us.ibm.com> Closes #11059 from holdenk/SPARK-13164-replace-deprecated-synchronized-buffer-in-core.
* [SPARK-12330][MESOS] Fix mesos coarse mode cleanupCharles Allen2016-02-042-2/+45
| | | | | | | | | | | | | | In the current implementation the mesos coarse scheduler does not wait for the mesos tasks to complete before ending the driver. This causes a race where the task has to finish cleaning up before the mesos driver terminates it with a SIGINT (and SIGKILL after 3 seconds if the SIGINT doesn't work). This PR causes the mesos coarse scheduler to wait for the mesos tasks to finish (with a timeout defined by `spark.mesos.coarse.shutdown.ms`) This PR also fixes a regression caused by [SPARK-10987] whereby submitting a shutdown causes a race between the local shutdown procedure and the notification of the scheduler driver disconnection. If the scheduler driver disconnection wins the race, the coarse executor incorrectly exits with status 1 (instead of the proper status 0) With this patch the mesos coarse scheduler terminates properly, the executors clean up, and the tasks are reported as `FINISHED` in the Mesos console (as opposed to `KILLED` in < 1.6 or `FAILED` in 1.6 and later) Author: Charles Allen <charles@allen-net.com> Closes #10319 from drcrallen/SPARK-12330.
* [SPARK-13113] [CORE] Remove unnecessary bit operation when decoding page numberLiang-Chi Hsieh2016-02-031-1/+1
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-13113 As we shift bits right, looks like the bitwise AND operation is unnecessary. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11002 from viirya/improve-decodepagenumber.