aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-16435][YARN][MINOR] Add warning log if initialExecutors is less than ↵jerryshao2016-07-132-1/+21
| | | | | | | | | | | | | | | | minExecutors ## What changes were proposed in this pull request? Currently if `spark.dynamicAllocation.initialExecutors` is less than `spark.dynamicAllocation.minExecutors`, Spark will automatically pick the minExecutors without any warning. While in 1.6 Spark will throw exception if configured like this. So here propose to add warning log if these parameters are configured invalidly. ## How was this patch tested? Unit test added to verify the scenario. Author: jerryshao <sshao@hortonworks.com> Closes #14149 from jerryshao/SPARK-16435.
* [SPARK-16375][WEB UI] Fixed misassigned var: numCompletedTasks was assigned ↵Alex Bozarth2016-07-137-12/+12
| | | | | | | | | | | | | | | | to numSkippedTasks ## What changes were proposed in this pull request? I fixed a misassigned var, numCompletedTasks was assigned to numSkippedTasks in the convertJobData method ## How was this patch tested? dev/run-tests Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #14141 from ajbozarth/spark16375.
* [SPARK-16405] Add metrics and source for external shuffle serviceYangyang Liu2016-07-122-0/+45
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since externalShuffleService is essential for spark, better monitoring for shuffle service is necessary. In order to do so, we added various metrics in shuffle service and imported into ExternalShuffleServiceSource for metric system. Metrics added in shuffle service: * registeredExecutorsSize * openBlockRequestLatencyMillis * registerExecutorRequestLatencyMillis * blockTransferRateBytes JIRA Issue: https://issues.apache.org/jira/browse/SPARK-16405 ## How was this patch tested? Some test cases are added to verify metrics as expected in metric system. Those unit test cases are shown in `ExternalShuffleBlockHandlerSuite ` Author: Yangyang Liu <yangyangliu@fb.com> Closes #14080 from lovexi/yangyang-metrics.
* [SPARK-16477] Bump master version to 2.1.0-SNAPSHOTReynold Xin2016-07-111-1/+1
| | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14130 from rxin/SPARK-16477.
* [SPARK-16432] Empty blocks fail to serialize due to assert in ChunkedByteBufferEric Liang2016-07-082-13/+8
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? It's possible to also change the callers to not pass in empty chunks, but it seems cleaner to just allow `ChunkedByteBuffer` to handle empty arrays. cc JoshRosen ## How was this patch tested? Unit tests, also checked that the original reproduction case in https://github.com/apache/spark/pull/11748#issuecomment-230760283 is resolved. Author: Eric Liang <ekl@databricks.com> Closes #14099 from ericl/spark-16432.
* [SPARK-16376][WEBUI][SPARK WEB UI][APP-ID] HTTP ERROR 500 when using rest ↵Sean Owen2016-07-081-1/+6
| | | | | | | | | | | | | | | | api "/applications//jobs" if array "stageIds" is empty ## What changes were proposed in this pull request? Avoid error finding max of empty Seq when stageIds is empty. It does fix the immediate problem; I don't know if it results in meaningful output, but not an error at least. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #14105 from srowen/SPARK-16376.
* [SPARK-16420] Ensure compression streams are closed.Ryan Blue2016-07-083-11/+34
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This uses the try/finally pattern to ensure streams are closed after use. `UnsafeShuffleWriter` wasn't closing compression streams, causing them to leak resources until garbage collected. This was causing a problem with codecs that use off-heap memory. ## How was this patch tested? Current tests are sufficient. This should not change behavior. Author: Ryan Blue <blue@apache.org> Closes #14093 from rdblue/SPARK-16420-unsafe-shuffle-writer-leak.
* [SPARK-15885][WEB UI] Provide links to executor logs from stage details page ↵Tom Magrino2016-07-074-8/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in UI ## What changes were proposed in this pull request? This moves over old PR https://github.com/apache/spark/pull/13664 to target master rather than branch-1.6. Added links to logs (or an indication that there are no logs) for entries which list an executor in the stage details page of the UI. This helps streamline the workflow where a user views a stage details page and determines that they would like to see the associated executor log for further examination. Previously, a user would have to cross reference the executor id listed on the stage details page with the corresponding entry on the executors tab. Link to the JIRA: https://issues.apache.org/jira/browse/SPARK-15885 ## How was this patch tested? Ran existing unit tests. Ran test queries on a platform which did not record executor logs and again on a platform which did record executor logs and verified that the new table column was empty and links to the logs (which were verified as linking to the appropriate files), respectively. Attached is a screenshot of the UI page with no links, with the new columns highlighted. Additional screenshot of these columns with the populated links. Without links: ![updated without logs](https://cloud.githubusercontent.com/assets/1450821/16059721/2b69dbaa-3239-11e6-9eed-e539764ca159.png) With links: ![updated with logs](https://cloud.githubusercontent.com/assets/1450821/16059725/32c6e316-3239-11e6-90bd-2553f43f7779.png) This contribution is my original work and I license the work to the project under the Apache Spark project's open source license. Author: Tom Magrino <tmagrino@fb.com> Closes #13861 from tmagrino/uilogstweak.
* [SPARK-16398][CORE] Make cancelJob and cancelStage APIs publicMasterDDT2016-07-061-4/+14
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Make SparkContext `cancelJob` and `cancelStage` APIs public. This allows applications to use `SparkListener` to do their own management of jobs via events, but without using the REST API. ## How was this patch tested? Existing tests (dev/run-tests) Author: MasterDDT <miteshp@live.com> Closes #14072 from MasterDDT/SPARK-16398.
* [SPARK-16379][CORE][MESOS] Spark on mesos is broken due to race condition in ↵Sean Owen2016-07-062-5/+10
| | | | | | | | | | | | | | | | | | | | Logging ## What changes were proposed in this pull request? The commit https://github.com/apache/spark/commit/044971eca0ff3c2ce62afa665dbd3072d52cbbec introduced a lazy val to simplify code in Logging. Simple enough, though one side effect is that accessing log now means grabbing the instance's lock. This in turn turned up a form of deadlock in the Mesos code. It was arguably a bit of a problem in how this code is structured, but, in any event the safest thing to do seems to be to revert the commit, and that's 90% of the change here; it's just not worth the risk of similar more subtle issues. What I didn't revert here was the removal of this odd override of log in the Mesos code. In retrospect it might have been put in place at some stage as a defense against this type of problem. After all the Logging code still involved a lock at initialization before the change in question. Even after the revert, it doesn't seem like it does anything, given how Logging works now, so I left it removed. However, I also removed the particular log message that ended up playing a part in this problem anyway, maybe being paranoid, to make sure this type of problem can't happen even with how the current locking works in logging initialization. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #14069 from srowen/SPARK-16379.
* [SPARK-16304] LinkageError should not crash Spark executorpetermaxlee2016-07-062-1/+14
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch updates the failure handling logic so Spark executor does not crash when seeing LinkageError. ## How was this patch tested? Added an end-to-end test in FailureSuite. Author: petermaxlee <petermaxlee@gmail.com> Closes #13982 from petermaxlee/SPARK-16304.
* [SPARK-15591][WEBUI] Paginate Stage Table in Stages tabTao Lin2016-07-065-141/+441
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This patch adds pagination support for the Stage Tables in the Stage tab. Pagination is provided for all of the four Job Tables (active, pending, completed, and failed). Besides, the paged stage tables are also used in JobPage (the detail page for one job) and PoolPage. Interactions (jumping, sorting, and setting page size) for paged tables are also included. ## How was this patch tested? Tested manually by using checking the Web UI after completing and failing hundreds of jobs. Same as the testings for [Paginate Job Table in Jobs tab](https://github.com/apache/spark/pull/13620). This shows the pagination for completed stages: ![paged stage table](https://cloud.githubusercontent.com/assets/5558370/16125696/5804e35e-3427-11e6-8923-5c5948982648.png) Author: Tao Lin <nblintao@gmail.com> Closes #13708 from nblintao/stageTable.
* [SPARK-16385][CORE] Catch correct exception when calling method via reflection.Marcelo Vanzin2016-07-051-1/+1
| | | | | | | | | Using "Method.invoke" causes an exception to be thrown, not an error, so Utils.waitForProcess() was always throwing an exception when run on Java 7. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #14056 from vanzin/SPARK-16385.
* [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good ↵Sean Owen2016-07-041-0/+0
| | | | | | | | | | | | | | | | | | measure ## What changes were proposed in this pull request? Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet). No functional change at all, just less superfluous image data. ## How was this patch tested? `jekyll serve` Author: Sean Owen <sowen@cloudera.com> Closes #14029 from srowen/RemoveCompressImages.
* [MINOR][BUILD] Fix Java linter errorsDongjoon Hyun2016-07-022-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes the minor Java linter errors like the following. ``` - public int read(char cbuf[], int off, int len) throws IOException { + public int read(char[] cbuf, int off, int len) throws IOException { ``` ## How was this patch tested? Manual. ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14017 from dongjoon-hyun/minor_build_java_linter_error.
* [SPARK-16335][SQL] Structured streaming should fail if source directory does ↵Reynold Xin2016-07-011-5/+5
| | | | | | | | | | | | | | not exist ## What changes were proposed in this pull request? In structured streaming, Spark does not report errors when the specified directory does not exist. This is a behavior different from the batch mode. This patch changes the behavior to fail if the directory does not exist (when the path is not a glob pattern). ## How was this patch tested? Updated unit tests to reflect the new behavior. Author: Reynold Xin <rxin@databricks.com> Closes #14002 from rxin/SPARK-16335.
* [SPARK-16182][CORE] Utils.scala -- terminateProcess() should call ↵Sean Owen2016-07-012-31/+47
| | | | | | | | | | | | | | | | | | Process.destroyForcibly() if and only if Process.destroy() fails ## What changes were proposed in this pull request? Utils.terminateProcess should `destroy()` first and only fall back to `destroyForcibly()` if it fails. It's kind of bad that we're force-killing executors -- and only in Java 8. See JIRA for an example of the impact: no shutdown While here: `Utils.waitForProcess` should use the Java 8 method if available instead of a custom implementation. ## How was this patch tested? Existing tests, which cover the force-kill case, and Amplab tests, which will cover both Java 7 and Java 8 eventually. However I tested locally on Java 8 and the PR builder will try Java 7 here. Author: Sean Owen <sowen@cloudera.com> Closes #13973 from srowen/SPARK-16182.
* [SPARK-15865][CORE] Blacklist should not result in job hanging with less ↵Imran Rashid2016-06-309-15/+194
| | | | | | | | | | | | | | | | | | | | | than 4 executors ## What changes were proposed in this pull request? Before this change, when you turn on blacklisting with `spark.scheduler.executorTaskBlacklistTime`, but you have fewer than `spark.task.maxFailures` executors, you can end with a job "hung" after some task failures. Whenever a taskset is unable to schedule anything on resourceOfferSingleTaskSet, we check whether the last pending task can be scheduled on *any* known executor. If not, the taskset (and any corresponding jobs) are failed. * Worst case, this is O(maxTaskFailures + numTasks). But unless many executors are bad, this should be small * This does not fail as fast as possible -- when a task becomes unschedulable, we keep scheduling other tasks. This is to avoid an O(numPendingTasks * numExecutors) operation * Also, it is conceivable this fails too quickly. You may be 1 millisecond away from unblacklisting a place for a task to run, or acquiring a new executor. ## How was this patch tested? Added unit test which failed before the change, ran new test 5k times manually, ran all scheduler tests manually, and the full suite via jenkins. Author: Imran Rashid <irashid@cloudera.com> Closes #13603 from squito/progress_w_few_execs_and_blacklist.
* [SPARK-13850] Force the sorter to Spill when number of elements in th…Sital Kedia2016-06-303-6/+30
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Force the sorter to Spill when number of elements in the pointer array reach a certain size. This is to workaround the issue of timSort failing on large buffer size. ## How was this patch tested? Tested by running a job which was failing without this change due to TimSort bug. Author: Sital Kedia <skedia@fb.com> Closes #13107 from sitalkedia/fix_TimSort.
* [SPARK-16238] Metrics for generated method and class bytecode sizeEric Liang2016-06-291-0/+12
| | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This extends SPARK-15860 to include metrics for the actual bytecode size of janino-generated methods. They can be accessed in the same way as any other codahale metric, e.g. ``` scala> org.apache.spark.metrics.source.CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot().getValues() res7: Array[Long] = Array(532, 532, 532, 542, 1479, 2670, 3585, 3585) scala> org.apache.spark.metrics.source.CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot().getValues() res8: Array[Long] = Array(5, 5, 5, 5, 10, 10, 10, 10, 15, 15, 15, 38, 63, 79, 88, 94, 94, 94, 132, 132, 165, 165, 220, 220) ``` ## How was this patch tested? Small unit test, also verified manually that the performance impact is minimal (<10%). hvanhovell Author: Eric Liang <ekl@databricks.com> Closes #13934 from ericl/spark-16238.
* [SPARK-16148][SCHEDULER] Allow for underscores in TaskLocation in the ↵Tom Magrino2016-06-282-7/+9
| | | | | | | | | | | | | | | | | | | | | | Executor ID ## What changes were proposed in this pull request? Previously, the TaskLocation implementation would not allow for executor ids which include underscores. This tweaks the string split used to get the hostname and executor id, allowing for underscores in the executor id. This addresses the JIRA found here: https://issues.apache.org/jira/browse/SPARK-16148 This is moved over from a previous PR against branch-1.6: https://github.com/apache/spark/pull/13857 ## How was this patch tested? Ran existing unit tests for core and streaming. Manually ran a simple streaming job with an executor whose id contained underscores and confirmed that the job ran successfully. This is my original work and I license the work to the project under the project's open source license. Author: Tom Magrino <tmagrino@fb.com> Closes #13858 from tmagrino/fixtasklocation.
* [SPARK-16106][CORE] TaskSchedulerImpl should properly track executors added ↵Imran Rashid2016-06-272-65/+111
| | | | | | | | | | | | | | | | to existing hosts ## What changes were proposed in this pull request? TaskSchedulerImpl used to only set `newExecAvailable` when a new *host* was added, not when a new executor was added to an existing host. It also didn't update some internal state tracking live executors until a task was scheduled on the executor. This patch changes it to properly update as soon as it knows about a new executor. ## How was this patch tested? added a unit test, ran everything via jenkins. Author: Imran Rashid <irashid@cloudera.com> Closes #13826 from squito/SPARK-16106_executorByHosts.
* [SPARK-16136][CORE] Fix flaky TaskManagerSuiteImran Rashid2016-06-271-25/+43
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? TaskManagerSuite "Kill other task attempts when one attempt belonging to the same task succeeds" was flaky. When checking whether a task is speculatable, at least one millisecond must pass since the task was submitted. Use a manual clock to avoid the problem. I noticed these tests were leaving lots of threads lying around as well (which prevented me from running the test repeatedly), so I fixed that too. ## How was this patch tested? Ran the test 1k times on my laptop, passed every time (it failed about 20% of the time before this). Author: Imran Rashid <irashid@cloudera.com> Closes #13848 from squito/fix_flaky_taskmanagersuite.
* [MINOR][CORE] Fix display wrong free memory size in the logjerryshao2016-06-271-1/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Free memory size displayed in the log is wrong (used memory), fix to make it correct. ## How was this patch tested? N/A Author: jerryshao <sshao@hortonworks.com> Closes #13804 from jerryshao/memory-log-fix.
* [SPARK-16193][TESTS] Address flaky ExternalAppendOnlyMapSuite spilling testsSean Owen2016-06-251-1/+12
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Make spill tests wait until job has completed before returning the number of stages that spilled ## How was this patch tested? Existing Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13896 from srowen/SPARK-16193.
* [SPARK-1301][WEB UI] Added anchor links to Accumulators and Tasks on StagePageAlex Bozarth2016-06-254-4/+64
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Sometimes the "Aggregated Metrics by Executor" table on the Stage page can get very long so actor links to the Accumulators and Tasks tables below it have been added to the summary at the top of the page. This has been done in the same way as the Jobs and Stages pages. Note: the Accumulators link only displays when the table exists. ## How was this patch tested? Manually Tested and dev/run-tests ![justtasks](https://cloud.githubusercontent.com/assets/13952758/15165269/6e8efe8c-16c9-11e6-9784-cffe966fdcf0.png) ![withaccumulators](https://cloud.githubusercontent.com/assets/13952758/15165270/7019ec9e-16c9-11e6-8649-db69ed7a317d.png) Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #13037 from ajbozarth/spark1301.
* [SPARK-15958] Make initial buffer size for the Sorter configurableSital Kedia2016-06-252-4/+7
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently the initial buffer size in the sorter is hard coded inside the code and is too small for large workload. As a result, the sorter spends significant time expanding the buffer size and copying the data. It would be useful to have it configurable. ## How was this patch tested? Tested by running a job on the cluster. Author: Sital Kedia <skedia@fb.com> Closes #13699 from sitalkedia/config_sort_buffer_upstream.
* [SPARK-15963][CORE] Catch `TaskKilledException` correctly in Executor.TaskRunnerLiwei Lin2016-06-242-1/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## The problem Before this change, if either of the following cases happened to a task , the task would be marked as `FAILED` instead of `KILLED`: - the task was killed before it was deserialized - `executor.kill()` marked `taskRunner.killed`, but before calling `task.killed()` the worker thread threw the `TaskKilledException` The reason is, in the `catch` block of the current [Executor.TaskRunner](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L362)'s implementation, we are mistakenly catching: ```scala case _: TaskKilledException | _: InterruptedException if task.killed => ... ``` the semantics of which is: - **(**`TaskKilledException` **OR** `InterruptedException`**)** **AND** `task.killed` Then when `TaskKilledException` is thrown but `task.killed` is not marked, we would mark the task as `FAILED` (which should really be `KILLED`). ## What changes were proposed in this pull request? This patch alters the catch condition's semantics from: - **(**`TaskKilledException` **OR** `InterruptedException`**)** **AND** `task.killed` to - `TaskKilledException` **OR** **(**`InterruptedException` **AND** `task.killed`**)** so that we can catch `TaskKilledException` correctly and mark the task as `KILLED` correctly. ## How was this patch tested? Added unit test which failed before the change, ran new test 1000 times manually Author: Liwei Lin <lwlin7@gmail.com> Closes #13685 from lw-lin/fix-task-killed.
* [SPARK-16129][CORE][SQL] Eliminate direct use of commons-lang classes in ↵Sean Owen2016-06-241-3/+2
| | | | | | | | | | | | | | | | favor of commons-lang3 ## What changes were proposed in this pull request? Replace use of `commons-lang` in favor of `commons-lang3` and forbid the former via scalastyle; remove `NotImplementedException` from `comons-lang` in favor of JDK `UnsupportedOperationException` ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13843 from srowen/SPARK-16129.
* [SPARK-16125][YARN] Fix not test yarn cluster mode correctly in YarnClusterSuitepeng.zhang2016-06-241-1/+2
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since SPARK-13220(Deprecate "yarn-client" and "yarn-cluster"), YarnClusterSuite doesn't test "yarn cluster" mode correctly. This pull request fixes it. ## How was this patch tested? Unit test (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: peng.zhang <peng.zhang@xiaomi.com> Closes #13836 from renozhang/SPARK-16125-test-yarn-cluster-mode.
* [SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation.Ryan Blue2016-06-234-16/+37
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors. This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message. ## How was this patch tested? This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors. Author: Ryan Blue <blue@apache.org> Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
* [SPARK-15660][CORE] Update RDD `variance/stdev` description and add ↵Dongjoon Hyun2016-06-235-8/+58
| | | | | | | | | | | | | | | | | | | | | | | | | popVariance/popStdev ## What changes were proposed in this pull request? In Spark-11490, `variance/stdev` are redefined as the **sample** `variance/stdev` instead of population ones. This PR updates the other old documentations to prevent users from misunderstanding. This will update the following Scala/Java API docs. - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter - http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/api/java/JavaDoubleRDD.html - http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/rdd/DoubleRDDFunctions.html - http://spark.apache.org/docs/2.0.0-preview/api/java/org/apache/spark/util/StatCounter.html Also, this PR adds them `popVariance` and `popStdev` functions clearly. ## How was this patch tested? Pass the updated Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13403 from dongjoon-hyun/SPARK-15660.
* [SPARK-16131] initialize internal logger lazily in Scala preferred wayPrajwal Tuladhar2016-06-222-12/+4
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Initialize logger instance lazily in Scala preferred way ## How was this patch tested? By running `./build/mvn clean test` locally Author: Prajwal Tuladhar <praj@infynyxx.com> Closes #13842 from infynyxx/spark_internal_logger.
* [SPARK-16003] SerializationDebugger runs into infinite loopEric Liang2016-06-222-6/+16
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This fixes SerializationDebugger to not recurse forever when `writeReplace` returns an object of the same class, which is the case for at least the `SQLMetrics` class. See also the OpenJDK unit tests on the behavior of recursive `writeReplace()`: https://github.com/openjdk-mirror/jdk7u-jdk/blob/f4d80957e89a19a29bb9f9807d2a28351ed7f7df/test/java/io/Serializable/nestedReplace/NestedReplace.java cc davies cloud-fan ## How was this patch tested? Unit tests for SerializationDebugger. Author: Eric Liang <ekl@databricks.com> Closes #13814 from ericl/spark-16003.
* [SPARK-15783][CORE] Fix Flakiness in BlacklistIntegrationSuiteImran Rashid2016-06-224-26/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Three changes here -- first two were causing failures w/ BlacklistIntegrationSuite 1. The testing framework didn't include the reviveOffers thread, so the test which involved delay scheduling might never submit offers late enough for the delay scheduling to kick in. So added in the periodic revive offers, just like the real scheduler. 2. `assertEmptyDataStructures` would occasionally fail, because it appeared there was still an active job. This is because in DAGScheduler, the jobWaiter is notified of the job completion before the data structures are cleaned up. Most of the time the test code that is waiting on the jobWaiter won't become active until after the data structures are cleared, but occasionally the race goes the other way, and the assertions fail. 3. `DAGSchedulerSuite` was not stopping all the inner parts it was setting up, so each test was leaking a number of threads. So we stop those parts too. 4. Turns out that `assertMapOutputAvailable` is not terribly useful in this framework -- most of the places I was trying to use it suffer from some race. 5. When there is an exception in the backend, try to improve the error msg a little bit. Before the exception was printed to the console, but the test would fail w/ a timeout, and the logs wouldn't show anything. ## How was this patch tested? I ran all the tests in `BlacklistIntegrationSuite` 5k times and everything in `DAGSchedulerSuite` 1k times on my laptop. Also I ran a full jenkins build with `BlacklistIntegrationSuite` 500 times and `DAGSchedulerSuite` 50 times, see https://github.com/apache/spark/pull/13548. (I tried more times but jenkins timed out.) To check for more leaked threads, I added some code to dump the list of all threads at the end of each test in DAGSchedulerSuite, which is how I discovered the mapOutputTracker and eventLoop were leaking threads. (I removed that code from the final pr, just part of the testing.) And I'll run Jenkins on this a couple of times to do one more check. Author: Imran Rashid <irashid@cloudera.com> Closes #13565 from squito/blacklist_extra_tests.
* [SPARK-16002][SQL] Sleep when no new data arrives to avoid 100% CPU usageShixiong Zhu2016-06-211-3/+15
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Add a configuration to allow people to set a minimum polling delay when no new data arrives (default is 10ms). This PR also cleans up some INFO logs. ## How was this patch tested? Existing unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13718 from zsxwing/SPARK-16002.
* [SPARK-16044][SQL] input_file_name() returns empty strings in data sources ↵hyukjinkwon2016-06-202-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | based on NewHadoopRDD ## What changes were proposed in this pull request? This PR makes `input_file_name()` function return the file paths not empty strings for external data sources based on `NewHadoopRDD`, such as [spark-redshift](https://github.com/databricks/spark-redshift/blob/cba5eee1ab79ae8f0fa9e668373a54d2b5babf6b/src/main/scala/com/databricks/spark/redshift/RedshiftRelation.scala#L149) and [spark-xml](https://github.com/databricks/spark-xml/blob/master/src/main/scala/com/databricks/spark/xml/util/XmlFile.scala#L39-L47). The codes with the external data sources below: ```scala df.select(input_file_name).show() ``` will produce - **Before** ``` +-----------------+ |input_file_name()| +-----------------+ | | +-----------------+ ``` - **After** ``` +--------------------+ | input_file_name()| +--------------------+ |file:/private/var...| +--------------------+ ``` ## How was this patch tested? Unit tests in `ColumnExpressionSuite`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #13759 from HyukjinKwon/SPARK-16044.
* [SPARK-16017][CORE] Send hostname from CoarseGrainedExecutorBackend to driverShixiong Zhu2016-06-175-11/+12
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? [SPARK-15395](https://issues.apache.org/jira/browse/SPARK-15395) changes the behavior that how the driver gets the executor host and the driver will get the executor IP address instead of the host name. This PR just sends the hostname from executors to driver so that driver can pass it to TaskScheduler. ## How was this patch tested? Existing unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13741 from zsxwing/SPARK-16017.
* [SPARK-15926] Improve readability of DAGScheduler stage creation methodsKay Ousterhout2016-06-172-86/+76
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This pull request refactors parts of the DAGScheduler to improve readability, focusing on the code around stage creation. One goal of this change it to make it clearer which functions may create new stages (as opposed to looking up stages that already exist). There are no functionality changes in this pull request. In more detail: * shuffleToMapStage was renamed to shuffleIdToMapStage (when reading the existing code I have sometimes struggled to remember what the key is -- is it a stage? A stage id? This change is intended to avoid that confusion) * Cleaned up the code to create shuffle map stages. Previously, creating a shuffle map stage involved 3 different functions (newOrUsedShuffleStage, newShuffleMapStage, and getShuffleMapStage), and it wasn't clear what the purpose of each function was. With the new code, a single function (getOrCreateShuffleMapStage) is responsible for getting a stage (if it already exists) or creating new shuffle map stages and any missing ancestor stages, and it delegates to createShuffleMapStage when new stages need to be created. There's some remaining confusion here because the getOrCreateParentStages call in createShuffleMapStage may recursively create ancestor stages; this is an issue I plan to fix in a future pull request, because it's trickier to fix and involves a slight functionality change. * newResultStage was renamed to createResultStage, for consistency with naming around shuffle map stages. * getParentStages has been renamed to getOrCreateParentStages, to make it clear that this function will sometimes create missing ancestor stages. * The only *slight* functionality change is that on line 478, updateJobIdStageIdMaps now uses a stage's parents instance variable rather than re-calculating them (I couldn't see any reason why they'd need to be re-calculated, and suspect this is just leftover from older code). * getAncestorShuffleDependencies was renamed to getMissingAncestorShuffleDependencies, to make it clear that this only returns dependencies that have not yet been run. cc squito markhamstra JoshRosen (who requested more DAG scheduler commenting long ago -- an issue this pull request tries, in part, to address) FYI rxin Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #13677 from kayousterhout/SPARK-15926.
* [SPARK-15782][YARN] Fix spark.jars and spark.yarn.dist.jars handlingNezih Yigitbasi2016-06-163-1/+38
| | | | | | | | | | | | | When `--packages` is specified with spark-shell the classes from those packages cannot be found, which I think is due to some of the changes in SPARK-12343. Tested manually with both scala 2.10 and 2.11 repls. vanzin davies can you guys please review? Author: Marcelo Vanzin <vanzin@cloudera.com> Author: Nezih Yigitbasi <nyigitbasi@netflix.com> Closes #13709 from nezihyigitbasi/SPARK-15782.
* [SPARK-15868][WEB UI] Executors table in Executors tab should sort Executor ↵Alex Bozarth2016-06-161-2/+7
| | | | | | | | | | | | | | | | | | | IDs in numerical order ## What changes were proposed in this pull request? Currently the Executors table sorts by id using a string sort (since that's what it is stored as). Since the id is a number (other than the driver) we should be sorting numerically. I have changed both the initial sort on page load as well as the table sort to sort on id numerically, treating non-numeric strings (like the driver) as "-1" ## How was this patch tested? Manually tested and dev/run-tests ![pageload](https://cloud.githubusercontent.com/assets/13952758/16027882/d32edd0a-318e-11e6-9faf-fc972b7c36ab.png) ![sorted](https://cloud.githubusercontent.com/assets/13952758/16027883/d34541c6-318e-11e6-9ed7-6bfc0cd4152e.png) Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #13654 from ajbozarth/spark15868.
* [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid ↵Sean Owen2016-06-162-5/+5
| | | | | | | | | | | | | | | | overrunning old gen in JVM default config ## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13618 from srowen/SPARK-15796.
* [SPARK-12922][SPARKR][WIP] Implement gapply() on DataFrame in SparkRNarine Kokhlikyan2016-06-151-3/+17
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? gapply() applies an R function on groups grouped by one or more columns of a DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() in the Dataset API. Please, let me know what do you think and if you have any ideas to improve it. Thank you! ## How was this patch tested? Unit tests. 1. Primitive test with different column types 2. Add a boolean column 3. Compute average by a group Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Author: NarineK <narine.kokhlikyan@us.ibm.com> Closes #12836 from NarineK/gapply2.
* [SPARK-15851][BUILD] Fix the call of the bash script to enable proper run in ↵Reynold Xin2016-06-151-1/+2
| | | | | | | | | | | | | | | | | | | Windows ## What changes were proposed in this pull request? The way bash script `build/spark-build-info` is called from core/pom.xml prevents Spark building on Windows. Instead of calling the script directly we call bash and pass the script as an argument. This enables running it on Windows with bash installed which typically comes with Git. This brings https://github.com/apache/spark/pull/13612 up-to-date and also addresses comments from the code review. Closes #13612 ## How was this patch tested? I built manually (on a Mac) to verify it didn't break Mac compilation. Author: Reynold Xin <rxin@databricks.com> Author: avulanov <nashb@yandex.ru> Closes #13691 from rxin/SPARK-15851.
* Revert "[SPARK-15782][YARN] Set spark.jars system property in client mode"Davies Liu2016-06-153-33/+1
| | | | This reverts commit 4df8df5c2e68f5a5d231c401b04d762d7a648159.
* [HOTFIX][CORE] fix flaky BasicSchedulerIntegrationTestImran Rashid2016-06-151-8/+7
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? SPARK-15927 exacerbated a race in BasicSchedulerIntegrationTest, so it went from very unlikely to fairly frequent. The issue is that stage numbering is not completely deterministic, but these tests treated it like it was. So turn off the tests. ## How was this patch tested? on my laptop the test failed abotu 10% of the time before this change, and didn't fail in 500 runs after the change. Author: Imran Rashid <irashid@cloudera.com> Closes #13688 from squito/hotfix_basic_scheduler.
* [SPARK-15782][YARN] Set spark.jars system property in client modeNezih Yigitbasi2016-06-153-1/+33
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? When `--packages` is specified with `spark-shell` the classes from those packages cannot be found, which I think is due to some of the changes in `SPARK-12343`. In particular `SPARK-12343` removes a line that sets the `spark.jars` system property in client mode, which is used by the repl main class to set the classpath. ## How was this patch tested? Tested manually. This system property is used by the repl to populate its classpath. If this is not set properly the classes for external packages cannot be found. tgravescs vanzin as you may be familiar with this part of the code. Author: Nezih Yigitbasi <nyigitbasi@netflix.com> Closes #13527 from nezihyigitbasi/repl-fix.
* [SPARK-15826][CORE] PipedRDD to allow configurable char encodingTejas Patil2016-06-154-22/+36
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Link to jira which describes the problem: https://issues.apache.org/jira/browse/SPARK-15826 The fix in this PR is to allow users specify encoding in the pipe() operation. For backward compatibility, keeping the default value to be system default. ## How was this patch tested? Ran existing unit tests Author: Tejas Patil <tejasp@fb.com> Closes #13563 from tejasapatil/pipedrdd_utf8.
* [SPARK-15518][CORE][FOLLOW-UP] Rename LocalSchedulerBackendEndpoint -> ↵Liwei Lin2016-06-154-19/+19
| | | | | | | | | | | | | | | | | LocalSchedulerBackend ## What changes were proposed in this pull request? This patch is a follow-up to https://github.com/apache/spark/pull/13288 completing the renaming: - LocalScheduler -> LocalSchedulerBackend~~Endpoint~~ ## How was this patch tested? Updated test cases to reflect the name change. Author: Liwei Lin <lwlin7@gmail.com> Closes #13683 from lw-lin/rename-backend.
* [SPARK-15046][YARN] Parse value of token renewal interval correctly.Marcelo Vanzin2016-06-152-2/+9
| | | | | | | | | | | Use the config variable definition both to set and parse the value, avoiding issues with code expecting the value in a different format. Tested by running spark-submit with --principal / --keytab. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #13669 from vanzin/SPARK-15046.