aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[SPARK-11206] Support SQL UI on the history server"Josh Rosen2015-11-307-57/+6
| | | | | | | | This reverts commit cc243a079b1c039d6e7f0b410d1654d94a090e14 / PR #9297 I'm reverting this because it broke SQLListenerMemoryLeakSuite in the master Maven builds. See #9991 for a discussion of why this broke the tests.
* [SPARK-11982] [SQL] improve performance of cartesian productDavies Liu2015-11-302-0/+70
| | | | | | | | | | | | This PR improve the performance of CartesianProduct by caching the result of right plan. After this patch, the query time of TPC-DS Q65 go down to 4 seconds from 28 minutes (420X faster). cc nongli Author: Davies Liu <davies@databricks.com> Closes #9969 from davies/improve_cartesian.
* [DOC] Explicitly state that top maintains the order of elementsWieland Hoffmann2015-11-302-3/+4
| | | | | | | | | Top is implemented in terms of takeOrdered, which already maintains the order, so top should, too. Author: Wieland Hoffmann <themineo@gmail.com> Closes #10013 from mineo/top-order.
* [SPARK-11859][MESOS] SparkContext accepts invalid Master URLs in the form ↵toddwan2015-11-302-6/+15
| | | | | | | | | | | | | | | | | | | | | zk://host:port for a multi-master Mesos cluster using ZooKeeper * According to below doc and validation logic in [SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231), master URL for a mesos cluster should always start with `mesos://` http://spark.apache.org/docs/latest/running-on-mesos.html `The Master URLs for Mesos are in the form mesos://host:5050 for a single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos cluster using ZooKeeper.` * However, [SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749) fails the validation and can receive master URL in the form `zk://host:port` * For the master URLs in the form `zk:host:port`, the valid form should be `mesos://zk://host:port` * This PR restrict the validation in `SparkContext.scala`, and now only mesos master URLs prefixed with `mesos://` can be accepted. * This PR also updated corresponding unit test. Author: toddwan <tawan0109@outlook.com> Closes #9886 from toddwan/S11859.
* [SPARK-11996][CORE] Make the executor thread dump work againShixiong Zhu2015-11-267-67/+21
| | | | | | | | | | In the previous implementation, the driver needs to know the executor listening address to send the thread dump request. However, in Netty RPC, the executor doesn't listen to any port, so the executor thread dump feature is broken. This patch makes the driver use the endpointRef stored in BlockManagerMasterEndpoint to send the thread dump request to fix it. Author: Shixiong Zhu <shixiong@databricks.com> Closes #9976 from zsxwing/executor-thread-dump.
* [SPARK-11999][CORE] Fix the issue that ThreadUtils.newDaemonCachedThreadPool ↵Shixiong Zhu2015-11-252-3/+56
| | | | | | | | | | doesn't cache any task In the previous codes, `newDaemonCachedThreadPool` uses `SynchronousQueue`, which is wrong. `SynchronousQueue` is an empty queue that cannot cache any task. This patch uses `LinkedBlockingQueue` to fix it along with other fixes to make sure `newDaemonCachedThreadPool` can use at most `maxThreadNumber` threads, and after that, cache tasks to `LinkedBlockingQueue`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #9978 from zsxwing/cached-threadpool.
* [SPARK-11206] Support SQL UI on the history serverCarson Wang2015-11-257-6/+57
| | | | | | | | | | | | | | On the live web UI, there is a SQL tab which provides valuable information for the SQL query. But once the workload is finished, we won't see the SQL tab on the history server. It will be helpful if we support SQL UI on the history server so we can analyze it even after its execution. To support SQL UI on the history server: 1. I added an `onOtherEvent` method to the `SparkListener` trait and post all SQL related events to the same event bus. 2. Two SQL events `SparkListenerSQLExecutionStart` and `SparkListenerSQLExecutionEnd` are defined in the sql module. 3. The new SQL events are written to event log using Jackson. 4. A new trait `SparkHistoryListenerFactory` is added to allow the history server to feed events to the SQL history listener. The SQL implementation is loaded at runtime using `java.util.ServiceLoader`. Author: Carson Wang <carson.wang@intel.com> Closes #9297 from carsonwang/SqlHistoryUI.
* [SPARK-11866][NETWORK][CORE] Make sure timed out RPCs are cleaned up.Marcelo Vanzin2015-11-258-187/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This change does a couple of different things to make sure that the RpcEnv-level code and the network library agree about the status of outstanding RPCs. For RPCs that do not expect a reply ("RpcEnv.send"), support for one way messages (hello CORBA!) was added to the network layer. This is a "fire and forget" message that does not require any state to be kept by the TransportClient; as a result, the RpcEnv 'Ack' message is not needed anymore. For RPCs that do expect a reply ("RpcEnv.ask"), the network library now returns the internal RPC id; if the RpcEnv layer decides to time out the RPC before the network layer does, it now asks the TransportClient to forget about the RPC, so that if the network-level timeout occurs, the client is not killed. As part of implementing the above, I cleaned up some of the code in the netty rpc backend, removing types that were not necessary and factoring out some common code. Of interest is a slight change in the exceptions when posting messages to a stopped RpcEnv; that's mostly to avoid nasty error messages from the local-cluster backend when shutting down, which pollutes the terminal output. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9917 from vanzin/SPARK-11866.
* [SPARK-10558][CORE] Fix wrong executor state in Masterjerryshao2015-11-254-8/+13
| | | | | | | | | | | | | | `ExecutorAdded` can only be sent to `AppClient` when worker report back the executor state as `LOADING`, otherwise because of concurrency issue, `AppClient` will possibly receive `ExectuorAdded` at first, then `ExecutorStateUpdated` with `LOADING` state. Also Master will change the executor state from `LAUNCHING` to `RUNNING` (`AppClient` report back the state as `RUNNING`), then to `LOADING` (worker report back to state as `LOADING`), it should be `LAUNCHING` -> `LOADING` -> `RUNNING`. Also it is wrongly shown in master UI, the state of executor should be `RUNNING` rather than `LOADING`: ![screen shot 2015-09-11 at 2 30 28 pm](https://cloud.githubusercontent.com/assets/850797/9809254/3155d840-5899-11e5-8cdf-ad06fef75762.png) Author: jerryshao <sshao@hortonworks.com> Closes #8714 from jerryshao/SPARK-10558.
* [SPARK-10864][WEB UI] app name is hidden if window is resizedAlex Bozarth2015-11-252-7/+3
| | | | | | | | | | | | | | Currently the Web UI navbar has a minimum width of 1200px; so if a window is resized smaller than that the app name goes off screen. The 1200px width seems to have been chosen since it fits the longest example app name without wrapping. To work with smaller window widths I made the tabs wrap since it looked better than wrapping the app name. This is a distinct change in how the navbar looks and I'm not sure if it's what we actually want to do. Other notes: - min-width set to 600px to keep the tabs from wrapping individually (will need to be adjusted if tabs are added) - app name will also wrap (making three levels) if a really really long app name is used Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #9874 from ajbozarth/spark10864.
* [SPARK-11974][CORE] Not all the temp dirs had been deleted when the JVM exitsZhongshuai Pei2015-11-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | deleting the temp dir like that ``` scala> import scala.collection.mutable import scala.collection.mutable scala> val a = mutable.Set(1,2,3,4,7,0,8,98,9) a: scala.collection.mutable.Set[Int] = Set(0, 9, 1, 2, 3, 7, 4, 8, 98) scala> a.foreach(x => {a.remove(x) }) scala> a.foreach(println(_)) 98 ``` You may not modify a collection while traversing or iterating over it.This can not delete all element of the collection Author: Zhongshuai Pei <peizhongshuai@huawei.com> Closes #9951 from DoingDone9/Bug_RemainDir.
* [SPARK-11956][CORE] Fix a few bugs in network lib-based file transfer.Marcelo Vanzin2015-11-253-13/+35
| | | | | | | | | | | | - NettyRpcEnv::openStream() now correctly propagates errors to the read side of the pipe. - NettyStreamManager now throws if the file being transferred does not exist. - The network library now correctly handles zero-sized streams. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9941 from vanzin/SPARK-11956.
* [SPARK-10666][SPARK-6880][CORE] Use properties from ActiveJob associated ↵Mark Hamstra2015-11-252-4/+109
| | | | | | | | | | | | | | | | | with a Stage This issue was addressed in https://github.com/apache/spark/pull/5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug. The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId". Because of a long-standing bug, the `jobId` parameter was never being used. Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880. The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks. This fix should be applied to all maintenance branches, since it has existed since 1.0. kayousterhout pankajarora12 Author: Mark Hamstra <markhamstra@gmail.com> Author: Imran Rashid <irashid@cloudera.com> Closes #6291 from markhamstra/SPARK-6880.
* [SPARK-11686][CORE] Issue WARN when dynamic allocation is disabled due to ↵Ashwin Swaroop2015-11-251-1/+1
| | | | | | | | | | spark.dynamicAllocation.enabled and spark.executor.instances both set Changed the log type to a 'warning' instead of 'info' as required. Author: Ashwin Swaroop <Ashwin Swaroop> Closes #9926 from ashwinswaroop/master.
* [SPARK-11805] free the array in UnsafeExternalSorter during spillingDavies Liu2015-11-242-22/+19
| | | | | | | | After calling spill() on SortedIterator, the array inside InMemorySorter is not needed, it should be freed during spilling, this could help to join multiple tables with limited memory. Author: Davies Liu <davies@databricks.com> Closes #9793 from davies/free_array.
* [SPARK-11929][CORE] Make the repl log4j configuration override the root logger.Marcelo Vanzin2015-11-243-55/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | In the default Spark distribution, there are currently two separate log4j config files, with different default values for the root logger, so that when running the shell you have a different default log level. This makes the shell more usable, since the logs don't overwhelm the output. But if you install a custom log4j.properties, you lose that, because then it's going to be used no matter whether you're running a regular app or the shell. With this change, the overriding of the log level is done differently; the log level repl's main class (org.apache.spark.repl.Main) is used to define the root logger's level when running the shell, defaulting to WARN if it's not set explicitly. On a somewhat related change, the shell output about the "sc" variable was changed a bit to contain a little more useful information about the application, since when the root logger's log level is WARN, that information is never shown to the user. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9816 from vanzin/shell-logging.
* [SPARK-11946][SQL] Audit pivot API for 1.6.Reynold Xin2015-11-241-1/+0
| | | | | | | | | | | | | | | | | | | | Currently pivot's signature looks like ```scala scala.annotation.varargs def pivot(pivotColumn: Column, values: Column*): GroupedData scala.annotation.varargs def pivot(pivotColumn: String, values: Any*): GroupedData ``` I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List. I also made similar changes for Python. Author: Reynold Xin <rxin@databricks.com> Closes #9929 from rxin/SPARK-11946.
* [SPARK-11872] Prevent the call to SparkContext#stop() in the listener bus's ↵tedyu2015-11-242-0/+35
| | | | | | | | | | | | thread This is continuation of SPARK-11761 Andrew suggested adding this protection. See tail of https://github.com/apache/spark/pull/9741 Author: tedyu <yuzhihong@gmail.com> Closes #9852 from tedyu/master.
* [SPARK-11906][WEB UI] Speculation Tasks Cause ProgressBar UI OverflowForest Fang2015-11-242-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | When there are speculative tasks in the stage, running progress bar could overflow and goes hidden on a new line: ![image](https://cloud.githubusercontent.com/assets/4317392/11326841/5fd3482e-9142-11e5-8ca5-cb2f0c0c8964.png) 3 completed / 2 running (including 1 speculative) out of 4 total tasks This is a simple fix by capping the started tasks at `total - completed` tasks ![image](https://cloud.githubusercontent.com/assets/4317392/11326842/6bb67260-9142-11e5-90f0-37f9174878ec.png) I should note my preferred way to fix it is via css style ```css .progress { display: flex; } ``` which shifts the correction burden from driver to web browser. However I couldn't get selenium test to measure the position/dimension of the progress bar correctly to get this unit tested. It also has the side effect that the width will be calibrated so the running occupies 2 / 5 instead of 1 / 4. ![image](https://cloud.githubusercontent.com/assets/4317392/11326848/7b03e9f0-9142-11e5-89ad-bd99cb0647cf.png) All in all, since this cosmetic bug is minor enough, I suppose the original simple fix should be good enough. Author: Forest Fang <forest.fang@outlook.com> Closes #9896 from saurfang/progressbar.
* [SPARK-11933][SQL] Rename mapGroup -> mapGroups and flatMapGroup -> ↵Reynold Xin2015-11-232-2/+2
| | | | | | | | | | | | flatMapGroups. Based on feedback from Matei, this is more consistent with mapPartitions in Spark. Also addresses some of the cleanups from a previous commit that renames the type variables. Author: Reynold Xin <rxin@databricks.com> Closes #9919 from rxin/SPARK-11933.
* [SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv.Marcelo Vanzin2015-11-239-42/+345
| | | | | | | | | | | | | | | | This change abstracts the code that serves jars / files to executors so that each RpcEnv can have its own implementation; the akka version uses the existing HTTP-based file serving mechanism, while the netty versions uses the new stream support added to the network lib, which makes file transfers benefit from the easier security configuration of the network library, and should also reduce overhead overall. The change includes a small fix to TransportChannelHandler so that it propagates user events to downstream handlers. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9530 from vanzin/SPARK-11140.
* [SPARK-11899][SQL] API audit for GroupedDataset.Reynold Xin2015-11-211-1/+1
| | | | | | | | | | | | 1. Renamed map to mapGroup, flatMap to flatMapGroup. 2. Renamed asKey -> keyAs. 3. Added more documentation. 4. Changed type parameter T to V on GroupedDataset. 5. Added since versions for all functions. Author: Reynold Xin <rxin@databricks.com> Closes #9880 from rxin/SPARK-11899.
* [SPARK-11787][SPARK-11883][SQL][FOLLOW-UP] Cleanup for this patch.Nong Li2015-11-203-320/+44
| | | | | | | | | | This mainly moves SqlNewHadoopRDD to the sql package. There is some state that is shared between core and I've left that in core. This allows some other associated minor cleanup. Author: Nong Li <nong@databricks.com> Closes #9845 from nongli/spark-11787.
* [SPARK-11887] Close PersistenceEngine at the end of PersistenceEngineSuite testsJosh Rosen2015-11-201-48/+52
| | | | | | | | | | | | | | | | | | | | In PersistenceEngineSuite, we do not call `close()` on the PersistenceEngine at the end of the test. For the ZooKeeperPersistenceEngine, this causes us to leak a ZooKeeper client, causing the logs of unrelated tests to be periodically spammed with connection error messages from that client: ``` 15/11/20 05:13:35.789 pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741) INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:15741. Will not attempt to authenticate using SASL (unknown error) 15/11/20 05:13:35.790 pool-1-thread-1-ScalaTest-running-PersistenceEngineSuite-SendThread(localhost:15741) WARN ClientCnxn: Session 0x15124ff48dd0000 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) ``` This patch fixes this by using a `finally` block. Author: Josh Rosen <joshrosen@databricks.com> Closes #9864 from JoshRosen/close-zookeeper-client-in-tests.
* [SPARK-11650] Reduce RPC timeouts to speed up slow AkkaUtilsSuite testJosh Rosen2015-11-201-1/+2
| | | | | | | | This patch reduces some RPC timeouts in order to speed up the slow "AkkaUtilsSuite.remote fetch ssl on - untrusted server", which used to take two minutes to run. Author: Josh Rosen <joshrosen@databricks.com> Closes #9869 from JoshRosen/SPARK-11650.
* [SPARK-11845][STREAMING][TEST] Added unit test to verify TrackStateRDD is ↵Tathagata Das2015-11-191-201/+210
| | | | | | | | | | correctly checkpointed To make sure that all lineage is correctly truncated for TrackStateRDD when checkpointed. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #9831 from tdas/SPARK-11845.
* [SPARK-4134][CORE] Lower severity of some executor loss logs.Marcelo Vanzin2015-11-194-22/+43
| | | | | | | | | Don't log ERROR messages when executors are explicitly killed or when the exit reason is not yet known. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9780 from vanzin/SPARK-11789.
* [SPARK-11746][CORE] Use cache-aware method dependencieshushan2015-11-191-1/+1
| | | | | | | | a small change Author: hushan <hushan@xiaomi.com> Closes #9691 from suyanNone/unify-getDependency.
* [SPARK-11828][CORE] Register DAGScheduler metrics source after app id is known.Marcelo Vanzin2015-11-192-3/+2
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9820 from vanzin/SPARK-11828.
* [SPARK-11799][CORE] Make it explicit in executor logs that uncaught e…Srinivasa Reddy Vundela2015-11-191-1/+5
| | | | | | | | | | …xceptions are thrown during executor shutdown This commit will make sure that when uncaught exceptions are prepended with [Container in shutdown] when JVM is shutting down. Author: Srinivasa Reddy Vundela <vsr@cloudera.com> Closes #9809 from vundela/master_11799.
* [SPARK-11831][CORE][TESTS] Use port 0 to avoid port conflicts in testsShixiong Zhu2015-11-192-14/+14
| | | | | | | | Use port 0 to fix port-contention-related flakiness Author: Shixiong Zhu <shixiong@databricks.com> Closes #9841 from zsxwing/SPARK-11831.
* [SPARK-11830][CORE] Make NettyRpcEnv bind to the specified hostzsxwing2015-11-192-5/+11
| | | | | | | | | | | | | This PR includes the following change: 1. Bind NettyRpcEnv to the specified host 2. Fix the port information in the log for NettyRpcEnv. 3. Fix the service name of NettyRpcEnv. Author: zsxwing <zsxwing@gmail.com> Author: Shixiong Zhu <shixiong@databricks.com> Closes #9821 from zsxwing/SPARK-11830.
* [SPARK-11787][SQL] Improve Parquet scan performance when using flat schemas.Nong Li2015-11-181-7/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds an alternate to the Parquet RecordReader from the parquet-mr project that is much faster for flat schemas. Instead of using the general converter mechanism from parquet-mr, this directly uses the lower level APIs from parquet-columnar and a customer RecordReader that directly assembles into UnsafeRows. This is optionally disabled and only used for supported schemas. Using the tpcds store sales table and doing a sum of increasingly more columns, the results are: For 1 Column: Before: 11.3M rows/second After: 18.2M rows/second For 2 Columns: Before: 7.2M rows/second After: 11.2M rows/second For 5 Columns: Before: 2.9M rows/second After: 4.5M rows/second Author: Nong Li <nong@databricks.com> Closes #9774 from nongli/parquet.
* [SPARK-11495] Fix potential socket / file handle leaks that were found via ↵Josh Rosen2015-11-182-15/+30
| | | | | | | | | | static analysis The HP Fortify Opens Source Review team (https://www.hpfod.com/open-source-review-project) reported a handful of potential resource leaks that were discovered using their static analysis tool. We should fix the issues identified by their scan. Author: Josh Rosen <joshrosen@databricks.com> Closes #9455 from JoshRosen/fix-potential-resource-leaks.
* [SPARK-10930] History "Stages" page "duration" can be confusingDerek Dagit2015-11-181-3/+16
| | | | | | Author: Derek Dagit <derekd@yahoo-inc.com> Closes #9051 from d2r/spark-10930-ui-max-task-dur.
* [SPARK-11649] Properly set Akka frame size in SparkListenerSuite testJosh Rosen2015-11-181-2/+3
| | | | | | | | | | SparkListenerSuite's _"onTaskGettingResult() called when result fetched remotely"_ test was extremely slow (1 to 4 minutes to run) and recently became extremely flaky, frequently failing with OutOfMemoryError. The root cause was the fact that this was using `System.setProperty` to set the Akka frame size, which was not actually modifying the frame size. As a result, this test would allocate much more data than necessary. The fix here is to simply use SparkConf in order to configure the frame size. Author: Josh Rosen <joshrosen@databricks.com> Closes #9822 from JoshRosen/SPARK-11649.
* [SPARK-10745][CORE] Separate configs between shuffle and RPCShixiong Zhu2015-11-189-18/+17
| | | | | | | | | | [SPARK-6028](https://issues.apache.org/jira/browse/SPARK-6028) uses network module to implement RPC. However, there are some configurations named with `spark.shuffle` prefix in the network module. This PR refactors them to make sure the user can control them in shuffle and RPC separately. The user can use `spark.rpc.*` to set the configuration for netty RPC. Author: Shixiong Zhu <shixiong@databricks.com> Closes #9481 from zsxwing/SPARK-10745.
* [SPARK-11809] Switch the default Mesos mode to coarse-grained modeReynold Xin2015-11-181-1/+1
| | | | | | | | Based on my conversions with people, I believe the consensus is that the coarse-grained mode is more stable and easier to reason about. It is best to use that as the default rather than the more flaky fine-grained mode. Author: Reynold Xin <rxin@databricks.com> Closes #9795 from rxin/SPARK-11809.
* [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should ↵Bryan Cutler2015-11-181-0/+27
| | | | | | | | | | accept a VoidFunction<...> Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null. This PR deprecates the old method and uses VoidFunction to allow for more concise declaration. Also added VoidFunction2 to Java API in order to use in Streaming methods. Unit test is added for using foreachRDD with VoidFunction, and changes have been tested with Java 7 and Java 8 using lambdas. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #9488 from BryanCutler/foreachRDD-VoidFunction-SPARK-4557.
* [SPARK-11792] [SQL] [FOLLOW-UP] Change SizeEstimation to KnownSizeEstimation ↵Yin Huai2015-11-182-26/+18
| | | | | | | | | | | | | | | | | and make estimatedSize return Long instead of Option[Long] https://issues.apache.org/jira/browse/SPARK-11792 The main changes include: * Renaming `SizeEstimation` to `KnownSizeEstimation`. Hopefully this new name has more information. * Making `estimatedSize` return `Long` instead of `Option[Long]`. * In `UnsaveHashedRelation`, `estimatedSize` will delegate the work to `SizeEstimator` if we have not created a `BytesToBytesMap`. Since we will put `UnsaveHashedRelation` to `BlockManager`, it is generally good to let it provide a more accurate size estimation. Also, if we do not put `BytesToBytesMap` directly into `BlockerManager`, I feel it is not really necessary to make `BytesToBytesMap` extends `KnownSizeEstimation`. Author: Yin Huai <yhuai@databricks.com> Closes #9813 from yhuai/SPARK-11792-followup.
* [SPARK-11195][CORE] Use correct classloader for TaskResultGetterHurshal Patel2015-11-183-8/+72
| | | | | | | | | | | | | | | Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader. The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`. Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`. See #9367 for previous comments See SPARK-11195 for a full repro Author: Hurshal Patel <hpatel516@gmail.com> Closes #9779 from choochootrain/spark-11195-master.
* [SPARK-6541] Sort executors by ID (numeric)Jean-Baptiste Onofré2015-11-182-3/+12
| | | | | | | | "Force" the executor ID sort with Int. Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #9165 from jbonofre/SPARK-6541.
* [SPARK-11792][SQL] SizeEstimator cannot provide a good size estimation of ↵Yin Huai2015-11-183-4/+47
| | | | | | | | | | | | UnsafeHashedRelations https://issues.apache.org/jira/browse/SPARK-11792 Right now, SizeEstimator will "think" a small UnsafeHashedRelation is several GBs. Author: Yin Huai <yhuai@databricks.com> Closes #9788 from yhuai/SPARK-11792.
* [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener ↵tedyu2015-11-171-18/+28
| | | | | | | | | | | | | | | | | bus's thread See discussion toward the tail of https://github.com/apache/spark/pull/9723 From zsxwing : ``` The user should not call stop or other long-time work in a listener since it will block the listener thread, and prevent from stopping SparkContext/StreamingContext. I cannot see an approach since we need to stop the listener bus's thread before stopping SparkContext/StreamingContext totally. ``` Proposed solution is to prevent the call to StreamingContext#stop() in the listener bus's thread. Author: tedyu <yuzhihong@gmail.com> Closes #9741 from tedyu/master.
* [SPARK-11583] [CORE] MapStatus Using RoaringBitmap More ProperlyKent Yao2015-11-173-5/+37
| | | | | | | | | | | | This PR upgrade the version of RoaringBitmap to 0.5.10, to optimize the memory layout, will be much smaller when most of blocks are empty. This PR is based on #9661 (fix conflicts), see all of the comments at https://github.com/apache/spark/pull/9661 . Author: Kent Yao <yaooqinn@hotmail.com> Author: Davies Liu <davies@databricks.com> Author: Charles Allen <charles@allen-net.com> Closes #9746 from davies/roaring_mapstatus.
* [SPARK-11016] Move RoaringBitmap to explicit Kryo serializerDavies Liu2015-11-171-9/+55
| | | | | | | | | | | Fix the serialization of RoaringBitmap with Kyro serializer This PR came from https://github.com/metamx/spark/pull/1, thanks to drcrallen Author: Davies Liu <davies@databricks.com> Author: Charles Allen <charles@allen-net.com> Closes #9748 from davies/SPARK-11016.
* [SPARK-11726] Throw exception on timeout when waiting for REST server responseJacek Lewandowski2015-11-171-3/+11
| | | | | | Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #9692 from jacek-lewandowski/SPARK-11726.
* [SPARK-9552] Add force control for killExecutors to avoid false killing for ↵Grace2015-11-175-15/+82
| | | | | | | | | | | | | | | | | | | | those busy executors By using the dynamic allocation, sometimes it occurs false killing for those busy executors. Some executors with assignments will be killed because of being idle for enough time (say 60 seconds). The root cause is that the Task-Launch listener event is asynchronized. For example, some executors are under assigning tasks, but not sending out the listener notification yet. Meanwhile, the dynamic allocation's executor idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the same time. 1. the timer expiration starts before the listener event arrives. 2. Then, the task is going to run on top of that killed/killing executor. It will lead to task failure finally. Here is the proposal to fix it. We can add the force control for killExecutor. If the force control is not set (i.e., false), we'd better to check if the executor under killing is idle or busy. If the current executor has some assignment, we should not kill that executor and return back false (to indicate killing failure). In dynamic allocation, we'd better to turn off force killing (i.e., force = false), we will meet killing failure if tries to kill a busy executor. And then, the executor timer won't be invalid. Later on, the task assignment event arrives, we can remove the idle timer accordingly. So that we can avoid false killing for those busy executors in dynamic allocation. For the rest of usages, the end users can decide if to use force killing or not by themselves. If to turn on that option, the killExecutor will do the action without any status checking. Author: Grace <jie.huang@intel.com> Author: Andrew Or <andrew@databricks.com> Author: Jie Huang <jie.huang@intel.com> Closes #7888 from GraceH/forcekill.
* [SPARK-11786][CORE] Tone down messages from akka error monitor.Marcelo Vanzin2015-11-171-1/+1
| | | | | | | | | | There events happen normally during the app's lifecycle, so printing out ERROR logs all the time is misleading, and can actually affect usability of interactive shells. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9772 from vanzin/SPARK-11786.
* [SPARK-11695][CORE] Set s3a credentialsChris Bannister2015-11-171-4/+9
| | | | | | | | Set s3a credentials when creating a new default hadoop configuration. Author: Chris Bannister <chris.bannister@swiftkey.com> Closes #9663 from Zariel/set-s3a-creds.