| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
The web UI's paginated table uses Javascript to implement certain navigation controls, such as table sorting and the "go to page" form. This is unnecessary and should be simplified to use plain HTML form controls and links.
/cc zsxwing, who wrote this original code, and yhuai.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #10441 from JoshRosen/simplify-paginated-table-sorting.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include the following changes:
1. Close `java.sql.Statement`
2. Fix incorrect `asInstanceOf`.
3. Remove unnecessary `synchronized` and `ReentrantLock`.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10440 from zsxwing/findbugs.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buffer underflow exception
Since we only need to implement `def skipBytes(n: Int)`,
code in #10213 could be simplified.
davies scwf
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #10253 from adrian-wang/kryo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The feature was first added at commit: 7b877b27053bfb7092e250e01a3b887e1b50a109 but was later removed (probably by mistake) at commit: fc8b58195afa67fbb75b4c8303e022f703cbf007.
This change sets the default path of RDDs created via sc.textFile(...) to the path argument.
Here is the symptom:
* Using spark-1.5.2-bin-hadoop2.6:
scala> sc.textFile("/home/root/.bashrc").name
res5: String = null
scala> sc.binaryFiles("/home/root/.bashrc").name
res6: String = /home/root/.bashrc
* while using Spark 1.3.1:
scala> sc.textFile("/home/root/.bashrc").name
res0: String = /home/root/.bashrc
scala> sc.binaryFiles("/home/root/.bashrc").name
res1: String = /home/root/.bashrc
Author: Yaron Weinsberg <wyaron@gmail.com>
Author: yaron <yaron@il.ibm.com>
Closes #10456 from wyaron/master.
|
|
|
|
|
|
|
|
|
| |
Instead of just cancel the registrationRetryTimer to avoid driver retry connect to master, change the function to schedule.
It is no need to register to master iteratively.
Author: echo2mei <534384876@qq.com>
Closes #10447 from echoTomei/master.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In SparkContext method `setCheckpointDir`, a warning is issued when spark master is not local and the passed directory for the checkpoint dir appears to be local.
In practice, when relying on HDFS configuration file and using a relative path for the checkpoint directory (using an incomplete URI without HDFS scheme, ...), this warning should not be issued and might be confusing.
In fact, in this case, the checkpoint directory is successfully created, and the checkpointing mechanism works as expected.
This PR uses the `FileSystem` instance created with the given directory, and checks whether it is local or not.
(The rationale is that since this same `FileSystem` instance is used to create the checkpoint dir anyway and can therefore be reliably used to determine if it is local or not).
The warning is only issued if the directory is not local, on top of the existing conditions.
Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com>
Closes #10392 from pierre-borckmans/SPARK-12440_CheckpointDir_Warning_NonLocal.
|
|
|
|
|
|
|
|
|
|
|
|
| |
suites after forcing to set specific value to "os.arch" property
Restore the original value of os.arch property after each test
Since some of tests forced to set the specific value to os.arch property, we need to set the original value.
Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Closes #10289 from kiszk/SPARK-12311.
|
|
|
|
|
|
|
|
|
|
|
|
| |
one class
Fix Tachyon deprecations; pull Tachyon dependency into `TachyonBlockManager` only
CC calvinjia as I probably need a double-check that the usage of the new API is correct.
Author: Sean Owen <sowen@cloudera.com>
Closes #10449 from srowen/SPARK-12500.
|
|
|
|
|
|
| |
Author: Nong Li <nong@databricks.com>
Closes #10422 from nongli/12471-pids.
|
|
|
|
|
|
| |
Author: Jacek Laskowski <jacek@japila.pl>
Closes #10432 from jaceklaskowski/minor-corrections.
|
|
|
|
|
|
|
|
| |
i.e. Hadoop 1 and Hadoop 2.0
Author: Reynold Xin <rxin@databricks.com>
Closes #10404 from rxin/SPARK-11807.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy.
After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4).
[1] https://github.com/ning/jvm-compressor-benchmark/wiki
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #10342 from davies/lz4.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
[info] ReplayListenerSuite:
[info] - Simple replay (58 milliseconds)
java.lang.NullPointerException
at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
```
https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests).
Tested locally to verify that the NPE is gone.
Author: Andrew Or <andrew@databricks.com>
Closes #10417 from andrewor14/fix-harmless-npe.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10394 from rxin/SPARK-2331.
|
|
|
|
|
|
|
|
|
|
| |
considering preferred local hosts
When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes #10346 from maropu/OptimizeBlockLocationOrder.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance.
Also added another API for resolving the JIRA Spark-12150.
Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : )
Thank you very much!
Author: gatorsmile <gatorsmile@gmail.com>
Closes #10335 from gatorsmile/rangeOperators.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10395 from rxin/SPARK-11808.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #10387 from rxin/version-bump.
|
|
|
|
|
|
| |
with Mesos cluster mode."
This reverts commit ad8c1f0b840284d05da737fb2cc5ebf8848f4490.
|
|
|
|
|
|
| |
REST server"
This reverts commit 8184568810e8a2e7d5371db2c6a0366ef4841f70.
|
|
|
|
| |
This reverts commit 2bebaa39d9da33bc93ef682959cd42c1968a6a3e.
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is usually an invalid location on the remote machine executing the job.
It is picked up by the Mesos support in cluster mode, and most of the time causes
the job to fail.
Fixes SPARK-12345
Author: Luc Bourlier <luc.bourlier@typesafe.com>
Closes #10329 from skyluc/issue/SPARK_HOME.
|
|
|
|
|
|
|
|
|
|
| |
new connections
Added `channelActive` to `RpcHandler` so that `NettyRpcHandler` doesn't need `clients` any more.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10301 from zsxwing/network-events.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
interval
Previously, the rpc timeout was the default network timeout, which is the same value
the driver uses to determine dead executors. This means if there is a network issue,
the executor is determined dead after one heartbeat attempt. There is a separate config
for the heartbeat interval which is a better value to use for the heartbeat RPC. With
this change, the executor will make multiple heartbeat attempts even with RPC issues.
Author: Nong Li <nong@databricks.com>
Closes #10365 from nongli/spark-12411.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In discussion (SPARK-9552), we proposed a force kill in `killExecutors`. But if there is nothing to kill, it will return back with true (acknowledgement). And then, it causes the certain executor(s) (which is not eligible to kill) adding to pendingToRemove list for further actions.
In this patch, we'd like to change the return semantics. If there is nothing to kill, we will return "false". and therefore all those non-eligible executors won't be added to the pendingToRemove list.
vanzin andrewor14 As the follow up of PR#7888, please let me know your comments.
Author: Grace <jie.huang@intel.com>
Author: Jie Huang <hjie@fosun.com>
Author: Andrew Or <andrew@databricks.com>
Closes #9796 from GraceH/emptyPendingToRemove.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a client requests a non-existent stream, just send a failure message
back, without logging any error on the server side (since it's not a
server error).
On the executor side, avoid error logs by translating any errors during
transfer to a `ClassNotFoundException`, so that loading the class is
retried on a the parent class loader. This can mask IO errors during
transmission, but the most common cause is that the class is not
served by the remote end.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10337 from vanzin/SPARK-12350.
|
|
|
|
|
|
|
|
| |
I believe this fixes SPARK-12413. I'm currently running an integration test to verify.
Author: Michael Gummelt <mgummelt@mesosphere.io>
Closes #10366 from mgummelt/fix-zk-mesos.
|
|
|
|
|
|
|
|
|
| |
Not jira is created.
The original test is passed because the class cast is lazy (only when the object's method is invoked).
Author: Jeff Zhang <zjffdu@apache.org>
Closes #10371 from zjffdu/minor_fix.
|
|
|
|
|
|
|
|
| |
Fix problem with #10332, this one should fix Cluster mode on Mesos
Author: Iulian Dragos <jaguarul@gmail.com>
Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.
|
|
|
|
|
|
|
|
|
|
| |
characters
This PR encodes and decodes the file name to fix the issue.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10208 from zsxwing/uri.
|
|
|
|
| |
This reverts commit 5a514b61bbfb609c505d8d65f2483068a56f1f70.
|
|
|
|
|
|
|
|
| |
This commit is to resolve SPARK-12396.
Author: echo2mei <534384876@qq.com>
Closes #10354 from echoTomei/master.
|
|
|
|
|
|
|
|
| |
No change in functionality is intended. This only changes internal API.
Author: Andrew Or <andrew@databricks.com>
Closes #10343 from andrewor14/clean-bm-serializer.
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #10339 from vanzin/SPARK-12386.
|
|
|
|
|
|
|
|
| |
string when redirecting.
Author: Rohit Agarwal <rohita@qubole.com>
Closes #10180 from mindprince/SPARK-12186.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Runtime.getRuntime.addShutdownHook() is called
SPARK-9886 fixed ExternalBlockStore.scala
This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()
Author: tedyu <yuzhihong@gmail.com>
Closes #10325 from ted-yu/master.
|
|
|
|
|
|
|
|
|
|
| |
`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs). However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.
This was suggested by mateiz on https://github.com/apache/spark/pull/7699. It may have already turned up an issue in "zero split job".
Author: Imran Rashid <irashid@cloudera.com>
Closes #8466 from squito/SPARK-10248.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```
Author: Andrew Or <andrew@databricks.com>
Closes #10334 from andrewor14/rpc-typo.
|
|
|
|
|
|
|
|
|
|
|
|
| |
cluster mode.
SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.
We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.
Author: Timothy Chen <tnachen@gmail.com>
Closes #10332 from tnachen/scheduler_ui.
|
|
|
|
|
|
|
|
| |
This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
|
|
|
|
|
|
|
|
| |
ExternalBlockStore.scala
Author: Naveen <naveenminchu@gmail.com>
Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
|
|
|
|
|
|
|
|
| |
Please help to review, thanks a lot.
Author: jerryshao <sshao@hortonworks.com>
Closes #10195 from jerryshao/SPARK-10123.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AsyncRDDActions to support non-blocking operation
These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts.
Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026
Also fixes: https://issues.apache.org/jira/browse/SPARK-4514
This pull request contains all my own original work, which I release to the Spark project under its open source license.
Author: Richard W. Eggert II <richard.eggert@gmail.com>
Closes #9264 from reggert/fix-futureaction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-9516
- [x] new look of Thread Dump Page
- [x] click column title to sort
- [x] grep
- [x] search as you type
squito JoshRosen It's ready for the review now
Author: CodingCat <zhunansjtu@gmail.com>
Closes #7910 from CodingCat/SPARK-9516.
|
|
|
|
|
|
|
|
|
|
| |
ExternalShuffleBlockResolver
Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14
Author: Lianhui Wang <lianhuiwang09@gmail.com>
Closes #10131 from lianhuiwang/spark-12130.
|
|
|
|
|
|
|
|
| |
Fix a minor typo (unbalanced bracket) in ResetSystemProperties.
Author: Holden Karau <holden@us.ibm.com>
Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shutdown hook
1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook.
2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook.
This should fix the potential exceptions when exiting a local cluster
```
java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal
at scala.Predef$.assert(Predef.scala:179)
at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10269 from zsxwing/executor-state.
|
|
|
|
|
|
|
|
| |
disconnetion message
Author: Shixiong Zhu <shixiong@databricks.com>
Closes #10261 from zsxwing/SPARK-12267.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Problem.** In unified memory management, acquiring execution memory may lead to eviction of storage memory. However, the space freed from evicting cached blocks is distributed among all active tasks. Thus, an incorrect upper bound on the execution memory per task can cause the acquisition to fail, leading to OOM's and premature spills.
**Example.** Suppose total memory is 1000B, cached blocks occupy 900B, `spark.memory.storageFraction` is 0.4, and there are two active tasks. In this case, the cap on task execution memory is 100B / 2 = 50B. If task A tries to acquire 200B, it will evict 100B of storage but can only acquire 50B because of the incorrect cap. For another example, see this [regression test](https://github.com/andrewor14/spark/blob/fix-oom/core/src/test/scala/org/apache/spark/memory/UnifiedMemoryManagerSuite.scala#L233) that I stole from JoshRosen.
**Solution.** Fix the cap on task execution memory. It should take into account the space that could have been freed by storage in addition to the current amount of memory available to execution. In the example above, the correct cap should have been 600B / 2 = 300B.
This patch also guards against the race condition (SPARK-12253):
(1) Existing tasks collectively occupy all execution memory
(2) New task comes in and blocks while existing tasks spill
(3) After tasks finish spilling, another task jumps in and puts in a large block, stealing the freed memory
(4) New task still cannot acquire memory and goes back to sleep
Author: Andrew Or <andrew@databricks.com>
Closes #10240 from andrewor14/fix-oom.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds documentation for Spark configurations that affect off-heap memory and makes some naming and validation improvements for those configs.
- Change `spark.memory.offHeapSize` to `spark.memory.offHeap.size`. This is fine because this configuration has not shipped in any Spark release yet (it's new in Spark 1.6).
- Deprecated `spark.unsafe.offHeap` in favor of a new `spark.memory.offHeap.enabled` configuration. The motivation behind this change is to gather all memory-related configurations under the same prefix.
- Add a check which prevents users from setting `spark.memory.offHeap.enabled=true` when `spark.memory.offHeap.size == 0`. After SPARK-11389 (#9344), which was committed in Spark 1.6, Spark enforces a hard limit on the amount of off-heap memory that it will allocate to tasks. As a result, enabling off-heap execution memory without setting `spark.memory.offHeap.size` will lead to immediate OOMs. The new configuration validation makes this scenario easier to diagnose, helping to avoid user confusion.
- Document these configurations on the configuration page.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #10237 from JoshRosen/SPARK-12251.
|