spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵	Dongjoon Hyun	2016-02-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
*	[SPARK-13426][CORE] Remove the support of SIMR	jerryshao	2016-02-22	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR removes the support of SIMR, since SIMR is not actively used and maintained for a long time, also is not supported from `SparkSubmit`, so here propose to remove it. ## How was the this patch tested? This patch is tested locally by running unit tests. Author: jerryshao <sshao@hortonworks.com> Closes #11296 from jerryshao/SPARK-13426.
*	[SPARK-13408] [CORE] Ignore errors when it's already reported in JobWaiter	Shixiong Zhu	2016-02-19	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `JobWaiter.taskSucceeded` will be called for each task. When `resultHandler` throws an exception, `taskSucceeded` will also throw it for each task. DAGScheduler just catches it and reports it like this: ```Scala try { job.listener.taskSucceeded(rt.outputId, event.result) } catch { case e: Exception => // TODO: Perhaps we want to mark the resultStage as failed? job.listener.jobFailed(new SparkDriverExecutionException(e)) } ``` Therefore `JobWaiter.jobFailed` may be called multiple times. So `JobWaiter.jobFailed` should use `Promise.tryFailure` instead of `Promise.failure` because the latter one doesn't support calling multiple times. ## How was the this patch tested? Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11280 from zsxwing/SPARK-13408.
*	[SPARK-13407] Guard against garbage-collected accumulators in ↵	Josh Rosen	2016-02-19	1	-6/+4
\| \| \| \| \| \| \| \| \| \|	TaskMetrics.fromAccumulatorUpdates `TaskMetrics.fromAccumulatorUpdates()` can fail if accumulators have been garbage-collected on the driver. To guard against this, this patch introduces `ListenerTaskMetrics`, a subclass of `TaskMetrics` which is used only in `TaskMetrics.fromAccumulatorUpdates()` and which eliminates the need to access the original accumulators on the driver. Author: Josh Rosen <joshrosen@databricks.com> Closes #11276 from JoshRosen/accum-updates-fix.
*	[SPARK-13371][CORE][STRING] TaskSetManager.dequeueSpeculativeTask compares ↵	Sean Owen	2016-02-18	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Option and String directly. ## What changes were proposed in this pull request? Fix some comparisons between unequal types that cause IJ warnings and in at least one case a likely bug (TaskSetManager) ## How was the this patch tested? Running Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11253 from srowen/SPARK-13371.
*	[SPARK-13344][TEST] Fix harmless accumulator not found exceptions	Andrew Or	2016-02-17	3	-4/+30
\| \| \| \| \| \| \| \|	See [JIRA](https://issues.apache.org/jira/browse/SPARK-13344) for more detail. This was caused by #10835. Author: Andrew Or <andrew@databricks.com> Closes #11222 from andrewor14/fix-test-accum-exceptions.
*	[SPARK-13278][CORE] Launcher fails to start with JDK 9 EA	Claes Redestad	2016-02-14	1	-2/+4
\| \| \| \| \| \| \| \|	See http://openjdk.java.net/jeps/223 for more information about the JDK 9 version string scheme. Author: Claes Redestad <claes.redestad@gmail.com> Closes #11160 from cl4es/master.
*	[SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated	Sean Owen	2016-02-13	2	-3/+3
\| \| \| \| \| \| \| \|	Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.
*	[SPARK-5095] remove flaky test	Michael Gummelt	2016-02-12	1	-0/+5
\| \| \| \| \| \| \| \|	Overrode the start() method, which was previously starting a thread causing a race condition. I believe this should fix the flaky test. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11164 from mgummelt/fix_mesos_tests.
*	[SPARK-5095] Fix style in mesos coarse grained scheduler code	Michael Gummelt	2016-02-12	1	-6/+6
\| \| \| \| \| \| \| \|	andrewor14 This addressed your style comments from #10993 Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #11187 from mgummelt/fix_mesos_style.
*	[SPARK-6166] Limit number of in flight outbound requests	Sanket	2016-02-11	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	This JIRA is related to https://github.com/apache/spark/pull/5852 Had to do some minor rework and test to make sure it works with current version of spark. Author: Sanket <schintap@untilservice-lm> Closes #10838 from redsanket/limit-outbound-connections.
*	[SPARK-7889][WEBUI] HistoryServer updates UI for incomplete apps	Steve Loughran	2016-02-11	2	-6/+706
\| \| \| \| \| \| \| \| \| \| \|	When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes #11118 from squito/SPARK-7889-alternate.
*	[SPARK-12414][CORE] Remove closure serializer	Sean Owen	2016-02-10	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.
*	[SPARK-5095][MESOS] Support launching multiple mesos executors in coarse ↵	Michael Gummelt	2016-02-10	3	-114/+259
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	grained mesos mode. This is the next iteration of tnachen's previous PR: https://github.com/apache/spark/pull/4027 In that PR, we resolved with andrewor14 and pwendell to implement the Mesos scheduler's support of `spark.executor.cores` to be consistent with YARN and Standalone. This PR implements that resolution. This PR implements two high-level features. These two features are co-dependent, so they're implemented both here: - Mesos support for spark.executor.cores - Multiple executors per slave We at Mesosphere have been working with Typesafe on a Spark/Mesos integration test suite: https://github.com/typesafehub/mesos-spark-integration-tests, which passes for this PR. The contribution is my original work and I license the work to the project under the project's open source license. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10993 from mgummelt/executor_sizing.
*	[SPARK-10620][SPARK-13054] Minor addendum to #10835	Andrew Or	2016-02-08	6	-23/+23
\| \| \| \| \| \| \| \|	Additional changes to #10835, mainly related to style and visibility. This patch also adds back a few deprecated methods for backward compatibility. Author: Andrew Or <andrew@databricks.com> Closes #10958 from andrewor14/task-metrics-to-accums-followups.
*	[SPARK-13171][CORE] Replace future calls with Future	Jakob Odersky	2016-02-05	2	-12/+12
\| \| \| \| \| \| \| \| \|	Trivial search-and-replace to eliminate deprecation warnings in Scala 2.11. Also works with 2.10 Author: Jakob Odersky <jakob@odersky.com> Closes #11085 from jodersky/SPARK-13171.
*	[SPARK-13053][TEST] Unignore tests in InternalAccumulatorSuite	Andrew Or	2016-02-04	2	-78/+102
\| \| \| \| \| \| \| \| \| \|	These were ignored because they are incorrectly written; they don't actually trigger stage retries, which is what the tests are testing. These tests are now rewritten to induce stage retries through fetch failures. Note: there were 2 tests before and now there's only 1. What happened? It turns out that the case where we only resubmit a subset of of the original missing partitions is very difficult to simulate in tests without potentially introducing flakiness. This is because the `DAGScheduler` removes all map outputs associated with a given executor when this happens, and we will need multiple executors to trigger this case, and sometimes the scheduler still removes map outputs from all executors. Author: Andrew Or <andrew@databricks.com> Closes #10969 from andrewor14/unignore-accum-test.
*	[SPARK-13162] Standalone mode does not respect initial executors	Andrew Or	2016-02-04	1	-1/+16
\| \| \| \| \| \| \| \|	Currently the Master would always set an application's initial executor limit to infinity. If the user specified `spark.dynamicAllocation.initialExecutors`, the config would not take effect. This is similar to #11047 but for standalone mode. Author: Andrew Or <andrew@databricks.com> Closes #11054 from andrewor14/standalone-da-initial.
*	[SPARK-13164][CORE] Replace deprecated synchronized buffer in core	Holden Karau	2016-02-04	3	-25/+28
\| \| \| \| \| \| \| \|	Building with scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative. Investigation shows we are already using ConcurrentLinkedQueue in other locations so switch our uses of SynchronizedBuffer to ConcurrentLinkedQueue. Author: Holden Karau <holden@us.ibm.com> Closes #11059 from holdenk/SPARK-13164-replace-deprecated-synchronized-buffer-in-core.
*	[SPARK-12790][CORE] Remove HistoryServer old multiple files format	felixcheung	2016-02-01	2	-111/+9
\| \| \| \| \| \| \| \| \|	Removed isLegacyLogDirectory code path and updated tests andrewor14 Author: felixcheung <felixcheung_m@hotmail.com> Closes #10860 from felixcheung/historyserverformat.
*	[SPARK-6847][CORE][STREAMING] Fix stack overflow issue when updateStateByKey ↵	Shixiong Zhu	2016-02-01	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	is followed by a checkpointed dstream Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming Author: Shixiong Zhu <shixiong@databricks.com> Closes #10934 from zsxwing/recursive-checkpoint.
*	[SPARK-13096][TEST] Fix flaky verifyPeakExecutionMemorySet	Andrew Or	2016-01-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Previously we would assert things before all events are guaranteed to have been processed. To fix this, just block until all events are actually processed, i.e. until the listener queue is empty. https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/79/testReport/junit/org.apache.spark.util.collection/ExternalAppendOnlyMapSuite/spilling/ Author: Andrew Or <andrew@databricks.com> Closes #10990 from andrewor14/accum-suite-less-flaky.
*	[SPARK-13055] SQLHistoryListener throws ClassCastException	Andrew Or	2016-01-29	4	-25/+17
\| \| \| \| \| \| \| \| \| \|	This is an existing issue uncovered recently by #10835. The reason for the exception was because the `SQLHistoryListener` gets all sorts of accumulators, not just the ones that represent SQL metrics. For example, the listener gets the `internal.metrics.shuffleRead.remoteBlocksFetched`, which is an Int, then it proceeds to cast the Int to a Long, which fails. The fix is to mark accumulators representing SQL metrics using some internal metadata. Then we can identify which ones are SQL metrics and only process those in the `SQLHistoryListener`. Author: Andrew Or <andrew@databricks.com> Closes #10971 from andrewor14/fix-sql-history.
*	[SPARK-10873] Support column sort and search for History Server.	zhuol	2016-01-29	1	-25/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[SPARK-10873] Support column sort and search for History Server using jQuery DataTable and REST API. Before this commit, the history server was generated hard-coded html and can not support search, also, the sorting was disabled if there is any application that has more than one attempt. Supporting search and sort (over all applications rather than the 20 entries in the current page) in any case will greatly improve user experience. 1. Create the historypage-template.html for displaying application information in datables. 2. historypage.js uses jQuery to access the data from /api/v1/applications REST API, and use DataTable to display each application's information. For application that has more than one attempt, the RowsGroup is used to merge such entries while at the same time supporting sort and search. 3. "duration" and "lastUpdated" rest API are added to application's "attempts". 4. External javascirpt and css files for datatables, RowsGroup and jquery plugins are added with licenses clarified. Snapshots for how it looks like now: History page view: ![historypage](https://cloud.githubusercontent.com/assets/11683054/12184383/89bad774-b55a-11e5-84e4-b0276172976f.png) Search: ![search](https://cloud.githubusercontent.com/assets/11683054/12184385/8d3b94b0-b55a-11e5-869a-cc0ef0a4242a.png) Sort by started time: ![sort-by-started-time](https://cloud.githubusercontent.com/assets/11683054/12184387/8f757c3c-b55a-11e5-98c8-577936366566.png) Author: zhuol <zhuol@yahoo-inc.com> Closes #10648 from zhuoliu/10873.
*	[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API ↵	Josh Rosen	2016-01-27	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	contract Spark's `Partition` and `RDD.partitions` APIs have a contract which requires custom implementations of `RDD.partitions` to ensure that for all `x`, `rdd.partitions(x).index == x`; in other words, the `index` reported by a repartition needs to match its position in the partitions array. If a custom RDD implementation violates this contract, then Spark has the potential to become stuck in an infinite recomputation loop when recomputing a subset of an RDD's partitions, since the tasks that are actually run will not correspond to the missing output partitions that triggered the recomputation. Here's a link to a notebook which demonstrates this problem: https://rawgit.com/JoshRosen/e520fb9a64c1c97ec985/raw/5e8a5aa8d2a18910a1607f0aa4190104adda3424/Violating%2520RDD.partitions%2520contract.html In order to guard against this infinite loop behavior, this patch modifies Spark so that it fails fast and refuses to compute RDDs' whose `partitions` violate the API contract. Author: Josh Rosen <joshrosen@databricks.com> Closes #10932 from JoshRosen/SPARK-13021.
*	[SPARK-12895][SPARK-12896] Migrate TaskMetrics to accumulators	Andrew Or	2016-01-27	15	-481/+1727
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The high level idea is that instead of having the executors send both accumulator updates and TaskMetrics, we should have them send only accumulator updates. This eliminates the need to maintain both code paths since one can be implemented in terms of the other. This effort is split into two parts: SPARK-12895: Implement TaskMetrics using accumulators. TaskMetrics is basically just a bunch of accumulable fields. This patch makes TaskMetrics a syntactic wrapper around a collection of accumulators so we don't need to send TaskMetrics from the executors to the driver. SPARK-12896: Send only accumulator updates to the driver. Now that TaskMetrics are expressed in terms of accumulators, we can capture all TaskMetrics values if we just send accumulator updates from the executors to the driver. This completes the parent issue SPARK-10620. While an effort has been made to preserve as much of the public API as possible, there were a few known breaking DeveloperApi changes that would be very awkward to maintain. I will gather the full list shortly and post it here. Note: This was once part of #10717. This patch is split out into its own patch from there to make it easier for others to review. Other smaller pieces of already been merged into master. Author: Andrew Or <andrew@databricks.com> Closes #10835 from andrewor14/task-metrics-use-accums.
*	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming	Shixiong Zhu	2016-01-22	12	-318/+39
\| \| \| \| \| \| \| \| \| \| \| \|	- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
*	[SPARK-2750][WEB UI] Add https support to the Web UI	scwf	2016-01-19	5	-47/+124
\| \| \| \| \| \| \| \| \|	Author: scwf <wangfei1@huawei.com> Author: Marcelo Vanzin <vanzin@cloudera.com> Author: WangTaoTheTonic <wangtao111@huawei.com> Author: w00228970 <wangfei1@huawei.com> Closes #10238 from vanzin/SPARK-2750.
*	[SPARK-12887] Do not expose var's in TaskMetrics	Andrew Or	2016-01-19	7	-36/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a step in implementing SPARK-10620, which migrates TaskMetrics to accumulators. TaskMetrics has a bunch of var's, some are fully public, some are `private[spark]`. This is bad coding style that makes it easy to accidentally overwrite previously set metrics. This has happened a few times in the past and caused bugs that were difficult to debug. Instead, we should have get-or-create semantics, which are more readily understandable. This makes sense in the case of TaskMetrics because these are just aggregated metrics that we want to collect throughout the task, so it doesn't matter who's incrementing them. Parent PR: #10717 Author: Andrew Or <andrew@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Author: andrewor14 <andrew@databricks.com> Closes #10815 from andrewor14/get-or-create-metrics.
*	[SPARK-12885][MINOR] Rename 3 fields in ShuffleWriteMetrics	Andrew Or	2016-01-18	7	-42/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a small step in implementing SPARK-10620, which migrates TaskMetrics to accumulators. This patch is strictly a cleanup patch and introduces no change in functionality. It literally just renames 3 fields for consistency. Today we have: ``` inputMetrics.recordsRead outputMetrics.bytesWritten shuffleReadMetrics.localBlocksFetched ... shuffleWriteMetrics.shuffleRecordsWritten shuffleWriteMetrics.shuffleBytesWritten shuffleWriteMetrics.shuffleWriteTime ``` The shuffle write ones are kind of redundant. We can drop the `shuffle` part in the method names. I added backward compatible (but deprecated) methods with the old names. Parent PR: #10717 Author: Andrew Or <andrew@databricks.com> Closes #10811 from andrewor14/rename-things.
*	[SPARK-10985][CORE] Avoid passing evicted blocks throughout BlockManager	Josh Rosen	2016-01-18	6	-76/+75
\| \| \| \| \| \| \| \|	This patch refactors portions of the BlockManager and CacheManager in order to avoid having to pass `evictedBlocks` lists throughout the code. It appears that these lists were only consumed by `TaskContext.taskMetrics`, so the new code now directly updates the metrics from the lower-level BlockManager methods. Author: Josh Rosen <joshrosen@databricks.com> Closes #10776 from JoshRosen/SPARK-10985.
*	[SPARK-12667] Remove block manager's internal "external block store" API	Reynold Xin	2016-01-15	9	-205/+86
\| \| \| \| \| \| \| \| \| \|	This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems. There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests. Author: Reynold Xin <rxin@databricks.com> Closes #10752 from rxin/remove-offheap.
*	[SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.	Koyo Yoshida	2016-01-15	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message. ![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png) Ｉt's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <koyo0615@gmail.com> Closes #10663 from yoshidakuy/SPARK-12708.
*	[SPARK-12174] Speed up BlockManagerSuite getRemoteBytes() test	Josh Rosen	2016-01-14	1	-41/+30
\| \| \| \| \| \| \| \| \| \|	This patch significantly speeds up the BlockManagerSuite's "SPARK-9591: getRemoteBytes from another location when Exception throw" test, reducing the test time from 45s to ~250ms. The key change was to set `spark.shuffle.io.maxRetries` to 0 (the code previously set `spark.network.timeout` to `2s`, but this didn't make a difference because the slowdown was not due to this timeout). Along the way, I also cleaned up the way that we handle SparkConf in BlockManagerSuite: previously, each test would mutate a shared SparkConf instance, while now each test gets a fresh SparkConf. Author: Josh Rosen <joshrosen@databricks.com> Closes #10759 from JoshRosen/SPARK-12174.
*	[SPARK-9844][CORE] File appender race condition during shutdown	Bryan Cutler	2016-01-14	1	-0/+77
\| \| \| \| \| \| \| \|	When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844.
*	[SPARK-12819] Deprecate TaskContext.isRunningLocally()	Josh Rosen	2016-01-13	1	-9/+0
\| \| \| \| \| \| \| \|	We've already removed local execution but didn't deprecate `TaskContext.isRunningLocally()`; we should deprecate it for 2.0. Author: Josh Rosen <joshrosen@databricks.com> Closes #10751 from JoshRosen/remove-local-exec-from-taskcontext.
*	[SPARK-12400][SHUFFLE] Avoid generating temp shuffle files for empty partitions	jerryshao	2016-01-13	1	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \|	This problem lies in `BypassMergeSortShuffleWriter`, empty partition will also generate a temp shuffle file with several bytes. So here change to only create file when partition is not empty. This problem only lies in here, no such issue in `HashShuffleWriter`. Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10376 from jerryshao/SPARK-12400.
*	[SPARK-12692][BUILD][CORE] Scala style: Fix the style violation (Space ↵	Kousuke Saruta	2016-01-12	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	before ",") Fix the style violation (space before , and :). This PR is a followup for #10643 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10719 from sarutak/SPARK-12692-followup-core.
*	[SPARK-12582][TEST] IndexShuffleBlockResolverSuite fails in windows	Yucai Yu	2016-01-12	1	-0/+136
\| \| \| \| \| \| \| \| \| \| \| \| \|	[SPARK-12582][Test] IndexShuffleBlockResolverSuite fails in windows * IndexShuffleBlockResolverSuite fails in windows due to file is not closed. * mv IndexShuffleBlockResolverSuite.scala from "test/java" to "test/scala". https://issues.apache.org/jira/browse/SPARK-12582 Author: Yucai Yu <yucai.yu@intel.com> Closes #10526 from yucai/master.
*	[SPARK-12340] Fix overflow in various take functions.	Reynold Xin	2016-01-09	1	-0/+4
\| \| \| \| \| \| \| \|	This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.
*	[SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuite	Josh Rosen	2016-01-08	1	-63/+25
\| \| \| \| \| \| \| \|	This patch deduplicates some test code in BlockManagerSuite. I'm splitting this change off from a larger PR in order to make things easier to review. Author: Josh Rosen <joshrosen@databricks.com> Closes #10667 from JoshRosen/block-mgr-tests-cleanup.
*	[SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition	Sean Owen	2016-01-08	1	-0/+1
\| \| \| \| \| \| \| \|	Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
*	[SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo	Shixiong Zhu	2016-01-07	1	-2/+18
\| \| \| \| \| \| \| \|	The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10609 from zsxwing/SPARK-12591.
*	[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0	Josh Rosen	2016-01-06	1	-86/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. Author: Josh Rosen <joshrosen@databricks.com> Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
*	[SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and ↵	Kousuke Saruta	2016-01-06	1	-45/+0
\| \| \| \| \| \| \| \| \| \|	GraphKryoRegistrator which are deprecated and no longer used Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala are no longer used so it's time to remove them in Spark 2.0. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10613 from sarutak/SPARK-12665.
*	[SPARK-3873][TESTS] Import ordering fixes.	Marcelo Vanzin	2016-01-05	82	-196/+176
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
*	[SPARK-12615] Remove some deprecated APIs in RDD/SparkContext	Reynold Xin	2016-01-05	3	-89/+0
\| \| \| \| \| \| \| \|	I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
*	[SPARK-12486] Worker should kill the executors more forcefully if possible.	Nong Li	2016-01-04	1	-6/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates the ExecutorRunner's terminate path to use the new java 8 API to terminate processes more forcefully if possible. If the executor is unhealthy, it would previously ignore the destroy() call. Presumably, the new java API was added to handle cases like this. We could update the termination path in the future to use OS specific commands for older java versions. Author: Nong Li <nong@databricks.com> Closes #10438 from nongli/spark-12486-executors.
*	[SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵	Sean Owen	2016-01-02	3	-9/+7
\| \| \| \| \| \| \| \| \| \|	and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
*	[SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵	Shixiong Zhu	2015-12-31	12	-464/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service * on port `. Since the service name in log is useful, I keep `RpcEnvConfig.name`. `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.