spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-12755][CORE] Stop the event logger before the DAG scheduler	Michael Allman	2016-01-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \|	[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped. This contribution is my original work, and I license this work to the Spark project under the project's open source license. Author: Michael Allman <michael@videoamp.com> Closes #10700 from mallman/stop_event_logger_first.
*	[HOTFIX]Remove rpcEnv.awaitTermination to avoid dead-lock in some test	Shixiong Zhu	2016-01-22	1	-1/+0
\| \| \| \|	Looks rpcEnv.awaitTermination may block some tests forever. Just remove it and investigate the tests.
*	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming	Shixiong Zhu	2016-01-22	29	-713/+103
\| \| \| \| \| \| \| \| \| \| \| \|	- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
*	[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all ↵	Shixiong Zhu	2016-01-20	7	-206/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Streaming events to the same thread as Spark events Including the following changes: 1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener 2. Remove StreamingListenerBus 3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus 4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents Author: Shixiong Zhu <shixiong@databricks.com> Closes #10779 from zsxwing/streaming-listener.
*	[SPARK-2750][WEB UI] Add https support to the Web UI	scwf	2016-01-19	20	-87/+273
\| \| \| \| \| \| \| \| \|	Author: scwf <wangfei1@huawei.com> Author: Marcelo Vanzin <vanzin@cloudera.com> Author: WangTaoTheTonic <wangtao111@huawei.com> Author: w00228970 <wangfei1@huawei.com> Closes #10238 from vanzin/SPARK-2750.
*	[SPARK-12887] Do not expose var's in TaskMetrics	Andrew Or	2016-01-19	25	-243/+280
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a step in implementing SPARK-10620, which migrates TaskMetrics to accumulators. TaskMetrics has a bunch of var's, some are fully public, some are `private[spark]`. This is bad coding style that makes it easy to accidentally overwrite previously set metrics. This has happened a few times in the past and caused bugs that were difficult to debug. Instead, we should have get-or-create semantics, which are more readily understandable. This makes sense in the case of TaskMetrics because these are just aggregated metrics that we want to collect throughout the task, so it doesn't matter who's incrementing them. Parent PR: #10717 Author: Andrew Or <andrew@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Author: andrewor14 <andrew@databricks.com> Closes #10815 from andrewor14/get-or-create-metrics.
*	[SPARK-12885][MINOR] Rename 3 fields in ShuffleWriteMetrics	Andrew Or	2016-01-18	24	-114/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a small step in implementing SPARK-10620, which migrates TaskMetrics to accumulators. This patch is strictly a cleanup patch and introduces no change in functionality. It literally just renames 3 fields for consistency. Today we have: ``` inputMetrics.recordsRead outputMetrics.bytesWritten shuffleReadMetrics.localBlocksFetched ... shuffleWriteMetrics.shuffleRecordsWritten shuffleWriteMetrics.shuffleBytesWritten shuffleWriteMetrics.shuffleWriteTime ``` The shuffle write ones are kind of redundant. We can drop the `shuffle` part in the method names. I added backward compatible (but deprecated) methods with the old names. Parent PR: #10717 Author: Andrew Or <andrew@databricks.com> Closes #10811 from andrewor14/rename-things.
*	[SPARK-10985][CORE] Avoid passing evicted blocks throughout BlockManager	Josh Rosen	2016-01-18	13	-237/+166
\| \| \| \| \| \| \| \|	This patch refactors portions of the BlockManager and CacheManager in order to avoid having to pass `evictedBlocks` lists throughout the code. It appears that these lists were only consumed by `TaskContext.taskMetrics`, so the new code now directly updates the metrics from the lower-level BlockManager methods. Author: Josh Rosen <joshrosen@databricks.com> Closes #10776 from JoshRosen/SPARK-10985.
*	[SPARK-12884] Move classes to their own files for readability	Andrew Or	2016-01-18	8	-360/+493
\| \| \| \| \| \| \| \| \| \|	This is a small step in implementing SPARK-10620, which migrates `TaskMetrics` to accumulators. This patch is strictly a cleanup patch and introduces no change in functionality. It literally just moves classes to their own files to avoid having single monolithic ones that contain 10 different classes. Parent PR: #10717 Author: Andrew Or <andrew@databricks.com> Closes #10810 from andrewor14/move-things.
*	[SPARK-12644][SQL] Update parquet reader to be vectorized.	Nong Li	2016-01-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This inlines a few of the Parquet decoders and adds vectorized APIs to support decoding in batch. There are a few particulars in the Parquet encodings that make this much more efficient. In particular, RLE encodings are very well suited for batch decoding. The Parquet 2.0 encodings are also very suited for this. This is a work in progress and does not affect the current execution. In subsequent patches, we will support more encodings and types before enabling this. Simple benchmarks indicate this can decode single ints about > 3x faster. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10593 from nongli/spark-12644.
*	[SPARK-12716][WEB UI] Add a TOTALS row to the Executors Web UI	Alex Bozarth	2016-01-15	1	-10/+64
\| \| \| \| \| \| \| \| \| \| \|	Added a Totals table to the top of the page to display the totals of each applicable column in the executors table. Old Description: ~~Created a TOTALS row containing the totals of each column in the executors UI. By default the TOTALS row appears at the top of the table. When a column is sorted the TOTALS row will always sort to either the top or bottom of the table.~~ Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #10668 from ajbozarth/spark12716.
*	[SPARK-12667] Remove block manager's internal "external block store" API	Reynold Xin	2016-01-15	27	-1051/+133
\| \| \| \| \| \| \| \| \| \|	This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems. There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests. Author: Reynold Xin <rxin@databricks.com> Closes #10752 from rxin/remove-offheap.
*	[SPARK-12708][UI] Sorting task error in Stages Page when yarn mode.	Koyo Yoshida	2016-01-15	6	-18/+46
\| \| \| \| \| \| \| \| \| \| \| \| \|	If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message. ![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png) Ｉt's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <koyo0615@gmail.com> Closes #10663 from yoshidakuy/SPARK-12708.
*	[SPARK-12174] Speed up BlockManagerSuite getRemoteBytes() test	Josh Rosen	2016-01-14	1	-41/+30
\| \| \| \| \| \| \| \| \| \|	This patch significantly speeds up the BlockManagerSuite's "SPARK-9591: getRemoteBytes from another location when Exception throw" test, reducing the test time from 45s to ~250ms. The key change was to set `spark.shuffle.io.maxRetries` to 0 (the code previously set `spark.network.timeout` to `2s`, but this didn't make a difference because the slowdown was not due to this timeout). Along the way, I also cleaned up the way that we handle SparkConf in BlockManagerSuite: previously, each test would mutate a shared SparkConf instance, while now each test gets a fresh SparkConf. Author: Josh Rosen <joshrosen@databricks.com> Closes #10759 from JoshRosen/SPARK-12174.
*	[SPARK-12784][UI] Fix Spark UI IndexOutOfBoundsException with dynamic allocation	Shixiong Zhu	2016-01-14	2	-6/+17
\| \| \| \| \| \| \| \|	Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10728 from zsxwing/SPARK-12784.
*	[SPARK-9844][CORE] File appender race condition during shutdown	Bryan Cutler	2016-01-14	2	-10/+95
\| \| \| \| \| \| \| \|	When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844.
*	[SPARK-12819] Deprecate TaskContext.isRunningLocally()	Josh Rosen	2016-01-13	5	-19/+4
\| \| \| \| \| \| \| \|	We've already removed local execution but didn't deprecate `TaskContext.isRunningLocally()`; we should deprecate it for 2.0. Author: Josh Rosen <joshrosen@databricks.com> Closes #10751 from JoshRosen/remove-local-exec-from-taskcontext.
*	[SPARK-12400][SHUFFLE] Avoid generating temp shuffle files for empty partitions	jerryshao	2016-01-13	2	-12/+51
\| \| \| \| \| \| \| \| \| \| \| \|	This problem lies in `BypassMergeSortShuffleWriter`, empty partition will also generate a temp shuffle file with several bytes. So here change to only create file when partition is not empty. This problem only lies in here, no such issue in `HashShuffleWriter`. Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10376 from jerryshao/SPARK-12400.
*	[SPARK-12690][CORE] Fix NPE in UnsafeInMemorySorter.free()	Carson Wang	2016-01-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it. ``` ERROR spark.TaskContextImpl: Error in TaskCompletionListener java.lang.NullPointerException at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10637 from carsonwang/FixNPE.
*	[SPARK-12692][BUILD][CORE] Scala style: Fix the style violation (Space ↵	Kousuke Saruta	2016-01-12	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	before ",") Fix the style violation (space before , and :). This PR is a followup for #10643 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10719 from sarutak/SPARK-12692-followup-core.
*	[SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1	Shixiong Zhu	2016-01-12	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	- [x] Upgrade Py4J to 0.9.1 - [x] SPARK-12657: Revert SPARK-12617 - [x] SPARK-12658: Revert SPARK-12511 - Still keep the change that only reading checkpoint once. This is a manual change and worth to take a look carefully. https://github.com/zsxwing/spark/commit/bfd4b5c040eb29394c3132af3c670b1a7272457c - [x] Verify no leak any more after reverting our workarounds Author: Shixiong Zhu <shixiong@databricks.com> Closes #10692 from zsxwing/py4j-0.9.1.
*	[SPARK-12582][TEST] IndexShuffleBlockResolverSuite fails in windows	Yucai Yu	2016-01-12	1	-17/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	[SPARK-12582][Test] IndexShuffleBlockResolverSuite fails in windows * IndexShuffleBlockResolverSuite fails in windows due to file is not closed. * mv IndexShuffleBlockResolverSuite.scala from "test/java" to "test/scala". https://issues.apache.org/jira/browse/SPARK-12582 Author: Yucai Yu <yucai.yu@intel.com> Closes #10526 from yucai/master.
*	[SPARK-12638][API DOC] Parameter explanation not very accurate for rdd ↵	Tommy YU	2016-01-12	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \|	function "aggregate" Currently, RDD function aggregate's parameter doesn't explain well, especially parameter "zeroValue". It's helpful to let junior scala user know that "zeroValue" attend both "seqOp" and "combOp" phase. Author: Tommy YU <tummyyu@163.com> Closes #10587 from Wenpei/rdd_aggregate_doc.
*	[SPARK-12340] Fix overflow in various take functions.	Reynold Xin	2016-01-09	3	-6/+10
\| \| \| \| \| \| \| \|	This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.
*	[SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuite	Josh Rosen	2016-01-08	1	-63/+25
\| \| \| \| \| \| \| \|	This patch deduplicates some test code in BlockManagerSuite. I'm splitting this change off from a larger PR in order to make things easier to review. Author: Josh Rosen <joshrosen@databricks.com> Closes #10667 from JoshRosen/block-mgr-tests-cleanup.
*	[SPARK-4819] Remove Guava's "Optional" from public API	Sean Owen	2016-01-08	7	-41/+302
\| \| \| \| \| \| \| \| \| \|	Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.
*	[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…	Thomas Graves	2016-01-08	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	…s on secure Hadoop https://issues.apache.org/jira/browse/SPARK-12654 So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Closes #10651 from tgravescs/SPARK-12654.
*	[SPARK-12701][CORE] FileAppender should use join to ensure writing thread ↵	Bryan Cutler	2016-01-08	1	-10/+1
\| \| \| \| \| \| \| \| \| \|	completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.
*	[SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition	Sean Owen	2016-01-08	1	-0/+1
\| \| \| \| \| \| \| \|	Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
*	[SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo	Shixiong Zhu	2016-01-07	2	-7/+37
\| \| \| \| \| \| \| \|	The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10609 from zsxwing/SPARK-12591.
*	[SPARK-12604][CORE] Addendum - use casting vs mapValues for countBy{Key,Value}	Sean Owen	2016-01-07	2	-2/+2
\| \| \| \| \| \| \| \|	Per rxin, let's use the casting for countByKey and countByValue as well. Let's see if this passes. Author: Sean Owen <sowen@cloudera.com> Closes #10641 from srowen/SPARK-12604.2.
*	[SPARK-12598][CORE] bug in setMinPartitions	Darek Blasiak	2016-01-07	1	-3/+2
\| \| \| \| \| \| \| \|	There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <darek.blasiak@640labs.com> Closes #10546 from datafarmer/setminpartitionsbug.
*	[STREAMING][MINOR] More contextual information in logs + minor code i…	Jacek Laskowski	2016-01-07	3	-4/+4
\| \| \| \| \| \| \| \| \| \|	…mprovements Please review and merge at your convenience. Thanks! Author: Jacek Laskowski <jacek@japila.pl> Closes #10595 from jaceklaskowski/streaming-minor-fixes.
*	[SPARK-12295] [SQL] external spilling for window functions	Davies Liu	2016-01-06	5	-8/+48
\| \| \| \| \| \| \| \| \| \|	This PR manage the memory used by window functions (buffered rows), also enable external spilling. After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G. Author: Davies Liu <davies@databricks.com> Closes #10605 from davies/unsafe_window.
*	[SPARK-12678][CORE] MapPartitionsRDD clearDependencies	Guillaume Poulin	2016-01-06	1	-1/+6
\| \| \| \| \| \| \| \| \|	MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin <poulin.guillaume@gmail.com> Closes #10623 from gpoulin/map_partition_deps.
*	[SPARK-12673][UI] Add missing uri prepending for job description	jerryshao	2016-01-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png) Author: jerryshao <sshao@hortonworks.com> Closes #10618 from jerryshao/SPARK-12673.
*	[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0	Josh Rosen	2016-01-06	8	-554/+36
\| \| \| \| \| \| \| \| \| \| \| \|	This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. Author: Josh Rosen <joshrosen@databricks.com> Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
*	[SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet ↵	Nong Li	2016-01-06	1	-0/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scan benchmarks. [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet scan benchmarks. We've run benchmarks ad hoc to measure the scanner performance. We will continue to invest in this and it makes sense to get these benchmarks into code. This adds a simple benchmarking utility to do this. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10589 from nongli/spark-12640.
*	[SPARK-12604][CORE] Java count(AprroxDistinct)ByKey methods return Scala ↵	Sean Owen	2016-01-06	3	-30/+36
\| \| \| \| \| \| \| \| \| \|	Long not Java Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change Author: Sean Owen <sowen@cloudera.com> Closes #10554 from srowen/SPARK-12604.
*	[SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and ↵	Kousuke Saruta	2016-01-06	2	-204/+0
\| \| \| \| \| \| \| \| \| \|	GraphKryoRegistrator which are deprecated and no longer used Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala are no longer used so it's time to remove them in Spark 2.0. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10613 from sarutak/SPARK-12665.
*	[SPARK-12340][SQL] fix Int overflow in the SparkPlan.executeTake, RDD.take ↵	QiangCai	2016-01-06	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	and AsyncRDDActions.takeAsync I have closed pull request https://github.com/apache/spark/pull/10487. And I create this pull request to resolve the problem. spark jira https://issues.apache.org/jira/browse/SPARK-12340 Author: QiangCai <david.caiq@gmail.com> Closes #10562 from QiangCai/bugfix.
*	[SPARK-3873][TESTS] Import ordering fixes.	Marcelo Vanzin	2016-01-05	82	-196/+176
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
*	[SPARK-3873][CORE] Import ordering fixes.	Marcelo Vanzin	2016-01-05	158	-250/+246
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10578 from vanzin/SPARK-3873-core.
*	[SPARK-12659] fix NPE in UnsafeExternalSorter (used by cartesian product)	Davies Liu	2016-01-05	3	-11/+44
\| \| \| \| \| \| \| \| \| \| \| \|	Cartesian product use UnsafeExternalSorter without comparator to do spilling, it will NPE if spilling happens. This bug also hitted by #10605 cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #10606 from davies/fix_spilling.
*	[SPARK-12615] Remove some deprecated APIs in RDD/SparkContext	Reynold Xin	2016-01-05	17	-624/+3
\| \| \| \| \| \| \| \|	I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
*	[SPARK-12641] Remove unused code related to Hadoop 0.23	Kousuke Saruta	2016-01-05	1	-10/+3
\| \| \| \| \| \| \| \|	Currently we don't support Hadoop 0.23 but there is a few code related to it so let's clean it up. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10590 from sarutak/SPARK-12641.
*	[SPARK-12486] Worker should kill the executors more forcefully if possible.	Nong Li	2016-01-04	3	-12/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates the ExecutorRunner's terminate path to use the new java 8 API to terminate processes more forcefully if possible. If the executor is unhealthy, it would previously ignore the destroy() call. Presumably, the new java API was added to handle cases like this. We could update the termination path in the future to use OS specific commands for older java versions. Author: Nong Li <nong@databricks.com> Closes #10438 from nongli/spark-12486-executors.
*	[SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵	Sean Owen	2016-01-02	24	-260/+78
\| \| \| \| \| \| \| \| \| \|	and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
*	[SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵	Shixiong Zhu	2015-12-31	27	-1113/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service * on port `. Since the service name in log is useful, I keep `RpcEnvConfig.name`. `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.
*	[SPARK-12561] Remove JobLogger in Spark 2.0.	Reynold Xin	2015-12-30	1	-277/+0
\| \| \| \| \| \| \| \|	It was research code and has been deprecated since 1.0.0. No one really uses it since they can just use event logging. Author: Reynold Xin <rxin@databricks.com> Closes #10530 from rxin/SPARK-12561.