spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-12490] Don't use Javascript for web UI's paginated table controls	Josh Rosen	2015-12-28	5	-97/+178
\| \| \| \| \| \| \| \| \| \|	The web UI's paginated table uses Javascript to implement certain navigation controls, such as table sorting and the "go to page" form. This is unnecessary and should be simplified to use plain HTML form controls and links. /cc zsxwing, who wrote this original code, and yhuai. Author: Josh Rosen <joshrosen@databricks.com> Closes #10441 from JoshRosen/simplify-paginated-table-sorting.
*	[SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs	Shixiong Zhu	2015-12-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs.
*	[SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw ↵	Daoyuan Wang	2015-12-29	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Buffer underflow exception Since we only need to implement `def skipBytes(n: Int)`, code in #10213 could be simplified. davies scwf Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10253 from adrian-wang/kryo.
*	[SPARK-12517] add default RDD name for one created via sc.textFile	Yaron Weinsberg	2015-12-29	2	-2/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The feature was first added at commit: 7b877b27053bfb7092e250e01a3b887e1b50a109 but was later removed (probably by mistake) at commit: fc8b58195afa67fbb75b4c8303e022f703cbf007. This change sets the default path of RDDs created via sc.textFile(...) to the path argument. Here is the symptom: * Using spark-1.5.2-bin-hadoop2.6: scala> sc.textFile("/home/root/.bashrc").name res5: String = null scala> sc.binaryFiles("/home/root/.bashrc").name res6: String = /home/root/.bashrc * while using Spark 1.3.1: scala> sc.textFile("/home/root/.bashrc").name res0: String = /home/root/.bashrc scala> sc.binaryFiles("/home/root/.bashrc").name res1: String = /home/root/.bashrc Author: Yaron Weinsberg <wyaron@gmail.com> Author: yaron <yaron@il.ibm.com> Closes #10456 from wyaron/master.
*	[SPARK-12396][CORE] Modify the function scheduleAtFixedRate to schedule.	echo2mei	2015-12-25	1	-2/+2
\| \| \| \| \| \| \| \| \|	Instead of just cancel the registrationRetryTimer to avoid driver retry connect to master, change the function to schedule. It is no need to register to master iteratively. Author: echo2mei <534384876@qq.com> Closes #10447 from echoTomei/master.
*	[SPARK-12440][CORE] Avoid setCheckpoint warning when directory is not local	pierre-borckmans	2015-12-24	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In SparkContext method `setCheckpointDir`, a warning is issued when spark master is not local and the passed directory for the checkpoint dir appears to be local. In practice, when relying on HDFS configuration file and using a relative path for the checkpoint directory (using an incomplete URI without HDFS scheme, ...), this warning should not be issued and might be confusing. In fact, in this case, the checkpoint directory is successfully created, and the checkpointing mechanism works as expected. This PR uses the `FileSystem` instance created with the given directory, and checks whether it is local or not. (The rationale is that since this same `FileSystem` instance is used to create the checkpoint dir anyway and can therefore be reliably used to determine if it is local or not). The warning is only issued if the directory is not local, on top of the existing conditions. Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com> Closes #10392 from pierre-borckmans/SPARK-12440_CheckpointDir_Warning_NonLocal.
*	[SPARK-12311][CORE] Restore previous value of "os.arch" property in test ↵	Kazuaki Ishizaki	2015-12-24	30	-81/+183
\| \| \| \| \| \| \| \| \| \| \| \|	suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch property after each test Since some of tests forced to set the specific value to os.arch property, we need to set the original value. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10289 from kiszk/SPARK-12311.
*	[SPARK-12500][CORE] Fix Tachyon deprecations; pull Tachyon dependency into ↵	Sean Owen	2015-12-23	3	-84/+104
\| \| \| \| \| \| \| \| \| \| \| \|	one class Fix Tachyon deprecations; pull Tachyon dependency into `TachyonBlockManager` only CC calvinjia as I probably need a double-check that the usage of the new API is correct. Author: Sean Owen <sowen@cloudera.com> Closes #10449 from srowen/SPARK-12500.
*	[SPARK-12471][CORE] Spark daemons will log their pid on start up.	Nong Li	2015-12-22	8	-20/+34
\| \| \| \| \| \|	Author: Nong Li <nong@databricks.com> Closes #10422 from nongli/12471-pids.
*	Minor corrections, i.e. typo fixes and follow deprecated	Jacek Laskowski	2015-12-22	5	-6/+6
\| \| \| \| \| \|	Author: Jacek Laskowski <jacek@japila.pl> Closes #10432 from jaceklaskowski/minor-corrections.
*	[SPARK-11807] Remove support for Hadoop < 2.2	Reynold Xin	2015-12-21	2	-24/+3
\| \| \| \| \| \| \| \|	i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
*	[SPARK-12388] change default compression to lz4	Davies Liu	2015-12-21	3	-11/+272
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.
*	[SPARK-12466] Fix harmless NPE in tests	Andrew Or	2015-12-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	``` [info] ReplayListenerSuite: [info] - Simple replay (58 milliseconds) java.lang.NullPointerException at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982) at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980) ``` https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests). Tested locally to verify that the NPE is gone. Author: Andrew Or <andrew@databricks.com> Closes #10417 from andrewor14/fix-harmless-npe.
*	[SPARK-2331] SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]	Reynold Xin	2015-12-21	1	-1/+1
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10394 from rxin/SPARK-2331.
*	[SPARK-12392][CORE] Optimize a location order of broadcast blocks by ↵	Takeshi YAMAMURO	2015-12-21	2	-2/+29
\| \| \| \| \| \| \| \| \| \|	considering preferred local hosts When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10346 from maropu/OptimizeBlockLocationOrder.
*	[SPARK-12374][SPARK-12150][SQL] Adding logical/physical operators for Range	gatorsmile	2015-12-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance. Also added another API for resolving the JIRA Spark-12150. Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : ) Thank you very much! Author: gatorsmile <gatorsmile@gmail.com> Closes #10335 from gatorsmile/rangeOperators.
*	[SPARK-11808] Remove Bagel.	Reynold Xin	2015-12-19	2	-2/+2
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
*	Bump master version to 2.0.0-SNAPSHOT.	Reynold Xin	2015-12-19	2	-2/+2
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
*	Revert "[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs ↵	Andrew Or	2015-12-18	2	-7/+2
\| \| \| \| \| \|	with Mesos cluster mode." This reverts commit ad8c1f0b840284d05da737fb2cc5ebf8848f4490.
*	Revert "[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos ↵	Andrew Or	2015-12-18	1	-1/+1
\| \| \| \| \| \|	REST server" This reverts commit 8184568810e8a2e7d5371db2c6a0366ef4841f70.
*	Revert "[SPARK-12413] Fix Mesos ZK persistence"	Andrew Or	2015-12-18	1	-5/+1
\| \| \| \|	This reverts commit 2bebaa39d9da33bc93ef682959cd42c1968a6a3e.
*	[SPARK-12345][CORE] Do not send SPARK_HOME through Spark submit REST interface	Luc Bourlier	2015-12-18	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	It is usually an invalid location on the remote machine executing the job. It is picked up by the Mesos support in cluster mode, and most of the time causes the job to fail. Fixes SPARK-12345 Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #10329 from skyluc/issue/SPARK_HOME.
*	[SPARK-11097][CORE] Add channelActive callback to RpcHandler to monitor the ↵	Shixiong Zhu	2015-12-18	4	-77/+96
\| \| \| \| \| \| \| \| \| \|	new connections Added `channelActive` to `RpcHandler` so that `NettyRpcHandler` doesn't need `clients` any more. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10301 from zsxwing/network-events.
*	[SPARK-12411][CORE] Decrease executor heartbeat timeout to match heartbeat ↵	Nong Li	2015-12-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	interval Previously, the rpc timeout was the default network timeout, which is the same value the driver uses to determine dead executors. This means if there is a network issue, the executor is determined dead after one heartbeat attempt. There is a separate config for the heartbeat interval which is a better value to use for the heartbeat RPC. With this change, the executor will make multiple heartbeat attempts even with RPC issues. Author: Nong Li <nong@databricks.com> Closes #10365 from nongli/spark-12411.
*	[SPARK-9552] Return "false" while nothing to kill in killExecutors	Grace	2015-12-18	3	-17/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In discussion (SPARK-9552), we proposed a force kill in `killExecutors`. But if there is nothing to kill, it will return back with true (acknowledgement). And then, it causes the certain executor(s) (which is not eligible to kill) adding to pendingToRemove list for further actions. In this patch, we'd like to change the return semantics. If there is nothing to kill, we will return "false". and therefore all those non-eligible executors won't be added to the pendingToRemove list. vanzin andrewor14 As the follow up of PR#7888, please let me know your comments. Author: Grace <jie.huang@intel.com> Author: Jie Huang <hjie@fosun.com> Author: Andrew Or <andrew@databricks.com> Closes #9796 from GraceH/emptyPendingToRemove.
*	[SPARK-12350][CORE] Don't log errors when requested stream is not found.	Marcelo Vanzin	2015-12-18	2	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a client requests a non-existent stream, just send a failure message back, without logging any error on the server side (since it's not a server error). On the executor side, avoid error logs by translating any errors during transfer to a `ClassNotFoundException`, so that loading the class is retried on a the parent class loader. This can mask IO errors during transmission, but the most common cause is that the class is not served by the remote end. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10337 from vanzin/SPARK-12350.
*	[SPARK-12413] Fix Mesos ZK persistence	Michael Gummelt	2015-12-18	1	-1/+5
\| \| \| \| \| \| \| \|	I believe this fixes SPARK-12413. I'm currently running an integration test to verify. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10366 from mgummelt/fix-zk-mesos.
*	[CORE][TESTS] minor fix of JavaSerializerSuite	Jeff Zhang	2015-12-18	1	-2/+7
\| \| \| \| \| \| \| \| \|	Not jira is created. The original test is passed because the class cast is lazy (only when the object's method is invoked). Author: Jeff Zhang <zjffdu@apache.org> Closes #10371 from zjffdu/minor_fix.
*	[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos REST server	Iulian Dragos	2015-12-18	1	-1/+1
\| \| \| \| \| \| \| \|	Fix problem with #10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <jaguarul@gmail.com> Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.
*	[SPARK-12220][CORE] Make Utils.fetchFile support files that contain special ↵	Shixiong Zhu	2015-12-17	5	-6/+46
\| \| \| \| \| \| \| \| \| \|	characters This PR encodes and decodes the file name to fix the issue. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10208 from zsxwing/uri.
*	Revert "Once driver register successfully, stop it to connect to master."	Davies Liu	2015-12-17	1	-1/+0
\| \| \| \|	This reverts commit 5a514b61bbfb609c505d8d65f2483068a56f1f70.
*	Once driver register successfully, stop it to connect to master.	echo2mei	2015-12-17	1	-0/+1
\| \| \| \| \| \| \| \|	This commit is to resolve SPARK-12396. Author: echo2mei <534384876@qq.com> Closes #10354 from echoTomei/master.
*	[SPARK-12390] Clean up unused serializer parameter in BlockManager	Andrew Or	2015-12-16	2	-28/+11
\| \| \| \| \| \| \| \|	No change in functionality is intended. This only changes internal API. Author: Andrew Or <andrew@databricks.com> Closes #10343 from andrewor14/clean-bm-serializer.
*	[SPARK-12386][CORE] Fix NPE when spark.executor.port is set.	Marcelo Vanzin	2015-12-16	1	-1/+6
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10339 from vanzin/SPARK-12386.
*	[SPARK-12186][WEB UI] Send the complete request URI including the query ↵	Rohit Agarwal	2015-12-16	1	-1/+3
\| \| \| \| \| \| \| \|	string when redirecting. Author: Rohit Agarwal <rohita@qubole.com> Closes #10180 from mindprince/SPARK-12186.
*	[SPARK-12365][CORE] Use ShutdownHookManager where ↵	tedyu	2015-12-16	3	-20/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Runtime.getRuntime.addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <yuzhihong@gmail.com> Closes #10325 from ted-yu/master.
*	[SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests	Imran Rashid	2015-12-16	2	-4/+29
\| \| \| \| \| \| \| \| \| \|	`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception. This was suggested by mateiz on https://github.com/apache/spark/pull/7699. It may have already turned up an issue in "zero split job". Author: Imran Rashid <irashid@cloudera.com> Closes #8466 from squito/SPARK-10248.
*	[MINOR] Add missing interpolation in NettyRPCEnv	Andrew Or	2015-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	``` Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ``` Author: Andrew Or <andrew@databricks.com> Closes #10334 from andrewor14/rpc-typo.
*	[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs with Mesos ↵	Timothy Chen	2015-12-16	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \|	cluster mode. SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined. We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead. Author: Timothy Chen <tnachen@gmail.com> Closes #10332 from tnachen/scheduler_ui.
*	[SPARK-12062][CORE] Change Master to asyc rebuild UI when application completes	Bryan Cutler	2015-12-15	2	-29/+52
\| \| \| \| \| \| \| \|	This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
*	[SPARK-9886][CORE] Fix to use ShutdownHookManager in	Naveen	2015-12-15	1	-11/+5
\| \| \| \| \| \| \| \|	ExternalBlockStore.scala Author: Naveen <naveenminchu@gmail.com> Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
*	[SPARK-10123][DEPLOY] Support specifying deploy mode from configuration	jerryshao	2015-12-15	2	-1/+45
\| \| \| \| \| \| \| \|	Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10195 from jerryshao/SPARK-10123.
*	[SPARK-9026][SPARK-4514] Modifications to JobWaiter, FutureAction, and ↵	Richard W. Eggert II	2015-12-15	7	-158/+251
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AsyncRDDActions to support non-blocking operation These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts. Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026 Also fixes: https://issues.apache.org/jira/browse/SPARK-4514 This pull request contains all my own original work, which I release to the Spark project under its open source license. Author: Richard W. Eggert II <richard.eggert@gmail.com> Closes #9264 from reggert/fix-futureaction.
*	[SPARK-9516][UI] Improvement of Thread Dump Page	CodingCat	2015-12-15	4	-43/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-9516 - [x] new look of Thread Dump Page - [x] click column title to sort - [x] grep - [x] search as you type squito JoshRosen It's ready for the review now Author: CodingCat <zhunansjtu@gmail.com> Closes #7910 from CodingCat/SPARK-9516.
*	[SPARK-12130] Replace shuffleManagerClass with shortShuffleMgrNames in ↵	Lianhui Wang	2015-12-15	4	-1/+9
\| \| \| \| \| \| \| \| \| \|	ExternalShuffleBlockResolver Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14 Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #10131 from lianhuiwang/spark-12130.
*	[SPARK-12332][TRIVIAL][TEST] Fix minor typo in ResetSystemProperties	Holden Karau	2015-12-15	1	-1/+1
\| \| \| \| \| \| \| \|	Fix a minor typo (unbalanced bracket) in ResetSystemProperties. Author: Holden Karau <holden@us.ibm.com> Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.
*	[SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in the ↵	Shixiong Zhu	2015-12-13	3	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shutdown hook 1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook. 2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook. This should fix the potential exceptions when exiting a local cluster ``` java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown. at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Author: Shixiong Zhu <shixiong@databricks.com> Closes #10269 from zsxwing/executor-state.
*	[SPARK-12267][CORE] Store the remote RpcEnv address to send the correct ↵	Shixiong Zhu	2015-12-12	4	-1/+65
\| \| \| \| \| \| \| \|	disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes #10261 from zsxwing/SPARK-12267.
*	[SPARK-12155][SPARK-12253] Fix executor OOM in unified memory management	Andrew Or	2015-12-10	4	-31/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem. In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills. Example. Suppose total memory is 1000B, cached blocks occupy 900B, `spark.memory.storageFraction` is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this [regression test](https://github.com/andrewor14/spark/blob/fix-oom/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala#L233) that I stole from JoshRosen. Solution. Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B. This patch also guards against the race condition (SPARK-12253): (1) Existing tasks collectively occupy all execution memory (2) New task comes in and blocks while existing tasks spill (3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory (4) New task still cannot acquire memory and goes back to sleep Author: Andrew Or <andrew@databricks.com> Closes #10240 from andrewor14/fix-oom.
*	[SPARK-12251] Document and improve off-heap memory configurations	Josh Rosen	2015-12-10	11	-19/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs. - Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6). - Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix. - Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion. - Document these configurations on the configuration page. Author: Josh Rosen <joshrosen@databricks.com> Closes #10237 from JoshRosen/SPARK-12251.