aboutsummaryrefslogtreecommitdiff
path: root/core/src
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-12340][SQL] fix Int overflow in the SparkPlan.executeTake, RDD.take ↵QiangCai2016-01-062-10/+10
| | | | | | | | | | | | | and AsyncRDDActions.takeAsync I have closed pull request https://github.com/apache/spark/pull/10487. And I create this pull request to resolve the problem. spark jira https://issues.apache.org/jira/browse/SPARK-12340 Author: QiangCai <david.caiq@gmail.com> Closes #10562 from QiangCai/bugfix.
* [SPARK-3873][TESTS] Import ordering fixes.Marcelo Vanzin2016-01-0582-196/+176
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
* [SPARK-3873][CORE] Import ordering fixes.Marcelo Vanzin2016-01-05158-250/+246
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10578 from vanzin/SPARK-3873-core.
* [SPARK-12659] fix NPE in UnsafeExternalSorter (used by cartesian product)Davies Liu2016-01-053-11/+44
| | | | | | | | | | | | Cartesian product use UnsafeExternalSorter without comparator to do spilling, it will NPE if spilling happens. This bug also hitted by #10605 cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #10606 from davies/fix_spilling.
* [SPARK-12615] Remove some deprecated APIs in RDD/SparkContextReynold Xin2016-01-0517-624/+3
| | | | | | | | I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
* [SPARK-12641] Remove unused code related to Hadoop 0.23Kousuke Saruta2016-01-051-10/+3
| | | | | | | | Currently we don't support Hadoop 0.23 but there is a few code related to it so let's clean it up. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10590 from sarutak/SPARK-12641.
* [SPARK-12486] Worker should kill the executors more forcefully if possible.Nong Li2016-01-043-12/+112
| | | | | | | | | | | | | | This patch updates the ExecutorRunner's terminate path to use the new java 8 API to terminate processes more forcefully if possible. If the executor is unhealthy, it would previously ignore the destroy() call. Presumably, the new java API was added to handle cases like this. We could update the termination path in the future to use OS specific commands for older java versions. Author: Nong Li <nong@databricks.com> Closes #10438 from nongli/spark-12486-executors.
* [SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵Sean Owen2016-01-0224-260/+78
| | | | | | | | | | and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
* [SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵Shixiong Zhu2015-12-3127-1113/+74
| | | | | | | | | | | | | | | | | | | | | setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`. * `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.
* [SPARK-12561] Remove JobLogger in Spark 2.0.Reynold Xin2015-12-301-277/+0
| | | | | | | | It was research code and has been deprecated since 1.0.0. No one really uses it since they can just use event logging. Author: Reynold Xin <rxin@databricks.com> Closes #10530 from rxin/SPARK-12561.
* [SPARK-12588] Remove HttpBroadcast in Spark 2.0.Reynold Xin2015-12-308-457/+13
| | | | | | | | We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.
* [SPARK-12399] Display correct error message when accessing REST API with an ↵Carson Wang2015-12-301-2/+14
| | | | | | | | | | | | | | | | | | | | | | | unknown app Id I got an exception when accessing the below REST API with an unknown application Id. `http://<server-url>:18080/api/v1/applications/xxx/jobs` Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx` ``` org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116) at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226) at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46) at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10352 from carsonwang/unknownAppFix.
* [SPARK-12263][DOCS] IllegalStateException: Memory can't be 0 for ↵Neelesh Srinivas Salian2015-12-301-1/+1
| | | | | | | | | | | SPARK_WORKER_MEMORY without unit Updated the Worker Unit IllegalStateException message to indicate no values less than 1MB instead of 0 to help solve this. Requesting review Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #10483 from nssalian/SPARK-12263.
* [SPARK-12490][CORE] Limit the css style scope to fix the Streaming UIShixiong Zhu2015-12-293-3/+5
| | | | | | | | | | | | #10441 broke the Streaming UI because of the new CSS style. <img width="503" alt="screen shot 2015-12-29 at 4 49 04 pm" src="https://cloud.githubusercontent.com/assets/1000778/12044763/1efce0fe-ae4c-11e5-9f8b-39df08426bf8.png"> This PR just added a class for the new style and only applied them to the paged tables. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10517 from zsxwing/fix-streaming-ui.
* [SPARK-12490] Don't use Javascript for web UI's paginated table controlsJosh Rosen2015-12-285-97/+178
| | | | | | | | | | The web UI's paginated table uses Javascript to implement certain navigation controls, such as table sorting and the "go to page" form. This is unnecessary and should be simplified to use plain HTML form controls and links. /cc zsxwing, who wrote this original code, and yhuai. Author: Josh Rosen <joshrosen@databricks.com> Closes #10441 from JoshRosen/simplify-paginated-table-sorting.
* [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugsShixiong Zhu2015-12-281-2/+1
| | | | | | | | | | | | Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs.
* [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw ↵Daoyuan Wang2015-12-291-6/+1
| | | | | | | | | | | | Buffer underflow exception Since we only need to implement `def skipBytes(n: Int)`, code in #10213 could be simplified. davies scwf Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10253 from adrian-wang/kryo.
* [SPARK-12517] add default RDD name for one created via sc.textFileYaron Weinsberg2015-12-292-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The feature was first added at commit: 7b877b27053bfb7092e250e01a3b887e1b50a109 but was later removed (probably by mistake) at commit: fc8b58195afa67fbb75b4c8303e022f703cbf007. This change sets the default path of RDDs created via sc.textFile(...) to the path argument. Here is the symptom: * Using spark-1.5.2-bin-hadoop2.6: scala> sc.textFile("/home/root/.bashrc").name res5: String = null scala> sc.binaryFiles("/home/root/.bashrc").name res6: String = /home/root/.bashrc * while using Spark 1.3.1: scala> sc.textFile("/home/root/.bashrc").name res0: String = /home/root/.bashrc scala> sc.binaryFiles("/home/root/.bashrc").name res1: String = /home/root/.bashrc Author: Yaron Weinsberg <wyaron@gmail.com> Author: yaron <yaron@il.ibm.com> Closes #10456 from wyaron/master.
* [SPARK-12396][CORE] Modify the function scheduleAtFixedRate to schedule.echo2mei2015-12-251-2/+2
| | | | | | | | | Instead of just cancel the registrationRetryTimer to avoid driver retry connect to master, change the function to schedule. It is no need to register to master iteratively. Author: echo2mei <534384876@qq.com> Closes #10447 from echoTomei/master.
* [SPARK-12440][CORE] Avoid setCheckpoint warning when directory is not localpierre-borckmans2015-12-241-2/+3
| | | | | | | | | | | | | | | | In SparkContext method `setCheckpointDir`, a warning is issued when spark master is not local and the passed directory for the checkpoint dir appears to be local. In practice, when relying on HDFS configuration file and using a relative path for the checkpoint directory (using an incomplete URI without HDFS scheme, ...), this warning should not be issued and might be confusing. In fact, in this case, the checkpoint directory is successfully created, and the checkpointing mechanism works as expected. This PR uses the `FileSystem` instance created with the given directory, and checks whether it is local or not. (The rationale is that since this same `FileSystem` instance is used to create the checkpoint dir anyway and can therefore be reliably used to determine if it is local or not). The warning is only issued if the directory is not local, on top of the existing conditions. Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com> Closes #10392 from pierre-borckmans/SPARK-12440_CheckpointDir_Warning_NonLocal.
* [SPARK-12311][CORE] Restore previous value of "os.arch" property in test ↵Kazuaki Ishizaki2015-12-2430-81/+183
| | | | | | | | | | | | suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch property after each test Since some of tests forced to set the specific value to os.arch property, we need to set the original value. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10289 from kiszk/SPARK-12311.
* [SPARK-12500][CORE] Fix Tachyon deprecations; pull Tachyon dependency into ↵Sean Owen2015-12-233-84/+104
| | | | | | | | | | | | one class Fix Tachyon deprecations; pull Tachyon dependency into `TachyonBlockManager` only CC calvinjia as I probably need a double-check that the usage of the new API is correct. Author: Sean Owen <sowen@cloudera.com> Closes #10449 from srowen/SPARK-12500.
* [SPARK-12471][CORE] Spark daemons will log their pid on start up.Nong Li2015-12-228-20/+34
| | | | | | Author: Nong Li <nong@databricks.com> Closes #10422 from nongli/12471-pids.
* Minor corrections, i.e. typo fixes and follow deprecatedJacek Laskowski2015-12-225-6/+6
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #10432 from jaceklaskowski/minor-corrections.
* [SPARK-11807] Remove support for Hadoop < 2.2Reynold Xin2015-12-212-24/+3
| | | | | | | | i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
* [SPARK-12388] change default compression to lz4Davies Liu2015-12-213-11/+272
| | | | | | | | | | | | | | According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.
* [SPARK-12466] Fix harmless NPE in testsAndrew Or2015-12-211-1/+5
| | | | | | | | | | | | | | | | | | | ``` [info] ReplayListenerSuite: [info] - Simple replay (58 milliseconds) java.lang.NullPointerException at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982) at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980) ``` https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests). Tested locally to verify that the NPE is gone. Author: Andrew Or <andrew@databricks.com> Closes #10417 from andrewor14/fix-harmless-npe.
* [SPARK-2331] SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]Reynold Xin2015-12-211-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10394 from rxin/SPARK-2331.
* [SPARK-12392][CORE] Optimize a location order of broadcast blocks by ↵Takeshi YAMAMURO2015-12-212-2/+29
| | | | | | | | | | considering preferred local hosts When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10346 from maropu/OptimizeBlockLocationOrder.
* [SPARK-12374][SPARK-12150][SQL] Adding logical/physical operators for Rangegatorsmile2015-12-211-1/+1
| | | | | | | | | | | | | | Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance. Also added another API for resolving the JIRA Spark-12150. Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : ) Thank you very much! Author: gatorsmile <gatorsmile@gmail.com> Closes #10335 from gatorsmile/rangeOperators.
* [SPARK-11808] Remove Bagel.Reynold Xin2015-12-192-2/+2
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-191-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* Revert "[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs ↵Andrew Or2015-12-182-7/+2
| | | | | | with Mesos cluster mode." This reverts commit ad8c1f0b840284d05da737fb2cc5ebf8848f4490.
* Revert "[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos ↵Andrew Or2015-12-181-1/+1
| | | | | | REST server" This reverts commit 8184568810e8a2e7d5371db2c6a0366ef4841f70.
* Revert "[SPARK-12413] Fix Mesos ZK persistence"Andrew Or2015-12-181-5/+1
| | | | This reverts commit 2bebaa39d9da33bc93ef682959cd42c1968a6a3e.
* [SPARK-12345][CORE] Do not send SPARK_HOME through Spark submit REST interfaceLuc Bourlier2015-12-181-2/+4
| | | | | | | | | | | | It is usually an invalid location on the remote machine executing the job. It is picked up by the Mesos support in cluster mode, and most of the time causes the job to fail. Fixes SPARK-12345 Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #10329 from skyluc/issue/SPARK_HOME.
* [SPARK-11097][CORE] Add channelActive callback to RpcHandler to monitor the ↵Shixiong Zhu2015-12-184-77/+96
| | | | | | | | | | new connections Added `channelActive` to `RpcHandler` so that `NettyRpcHandler` doesn't need `clients` any more. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10301 from zsxwing/network-events.
* [SPARK-12411][CORE] Decrease executor heartbeat timeout to match heartbeat ↵Nong Li2015-12-181-1/+3
| | | | | | | | | | | | | | interval Previously, the rpc timeout was the default network timeout, which is the same value the driver uses to determine dead executors. This means if there is a network issue, the executor is determined dead after one heartbeat attempt. There is a separate config for the heartbeat interval which is a better value to use for the heartbeat RPC. With this change, the executor will make multiple heartbeat attempts even with RPC issues. Author: Nong Li <nong@databricks.com> Closes #10365 from nongli/spark-12411.
* [SPARK-9552] Return "false" while nothing to kill in killExecutorsGrace2015-12-183-17/+24
| | | | | | | | | | | | | | In discussion (SPARK-9552), we proposed a force kill in `killExecutors`. But if there is nothing to kill, it will return back with true (acknowledgement). And then, it causes the certain executor(s) (which is not eligible to kill) adding to pendingToRemove list for further actions. In this patch, we'd like to change the return semantics. If there is nothing to kill, we will return "false". and therefore all those non-eligible executors won't be added to the pendingToRemove list. vanzin andrewor14 As the follow up of PR#7888, please let me know your comments. Author: Grace <jie.huang@intel.com> Author: Jie Huang <hjie@fosun.com> Author: Andrew Or <andrew@databricks.com> Closes #9796 from GraceH/emptyPendingToRemove.
* [SPARK-12350][CORE] Don't log errors when requested stream is not found.Marcelo Vanzin2015-12-182-11/+13
| | | | | | | | | | | | | | | | If a client requests a non-existent stream, just send a failure message back, without logging any error on the server side (since it's not a server error). On the executor side, avoid error logs by translating any errors during transfer to a `ClassNotFoundException`, so that loading the class is retried on a the parent class loader. This can mask IO errors during transmission, but the most common cause is that the class is not served by the remote end. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10337 from vanzin/SPARK-12350.
* [SPARK-12413] Fix Mesos ZK persistenceMichael Gummelt2015-12-181-1/+5
| | | | | | | | I believe this fixes SPARK-12413. I'm currently running an integration test to verify. Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #10366 from mgummelt/fix-zk-mesos.
* [CORE][TESTS] minor fix of JavaSerializerSuiteJeff Zhang2015-12-181-2/+7
| | | | | | | | | Not jira is created. The original test is passed because the class cast is lazy (only when the object's method is invoked). Author: Jeff Zhang <zjffdu@apache.org> Closes #10371 from zjffdu/minor_fix.
* [SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos REST serverIulian Dragos2015-12-181-1/+1
| | | | | | | | Fix problem with #10332, this one should fix Cluster mode on Mesos Author: Iulian Dragos <jaguarul@gmail.com> Closes #10359 from dragos/issue/fix-spark-12345-one-more-time.
* [SPARK-12220][CORE] Make Utils.fetchFile support files that contain special ↵Shixiong Zhu2015-12-175-6/+46
| | | | | | | | | | characters This PR encodes and decodes the file name to fix the issue. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10208 from zsxwing/uri.
* Revert "Once driver register successfully, stop it to connect to master."Davies Liu2015-12-171-1/+0
| | | | This reverts commit 5a514b61bbfb609c505d8d65f2483068a56f1f70.
* Once driver register successfully, stop it to connect to master.echo2mei2015-12-171-0/+1
| | | | | | | | This commit is to resolve SPARK-12396. Author: echo2mei <534384876@qq.com> Closes #10354 from echoTomei/master.
* [SPARK-12390] Clean up unused serializer parameter in BlockManagerAndrew Or2015-12-162-28/+11
| | | | | | | | No change in functionality is intended. This only changes internal API. Author: Andrew Or <andrew@databricks.com> Closes #10343 from andrewor14/clean-bm-serializer.
* [SPARK-12386][CORE] Fix NPE when spark.executor.port is set.Marcelo Vanzin2015-12-161-1/+6
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10339 from vanzin/SPARK-12386.
* [SPARK-12186][WEB UI] Send the complete request URI including the query ↵Rohit Agarwal2015-12-161-1/+3
| | | | | | | | string when redirecting. Author: Rohit Agarwal <rohita@qubole.com> Closes #10180 from mindprince/SPARK-12186.
* [SPARK-12365][CORE] Use ShutdownHookManager where ↵tedyu2015-12-163-20/+15
| | | | | | | | | | | | Runtime.getRuntime.addShutdownHook() is called SPARK-9886 fixed ExternalBlockStore.scala This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook() Author: tedyu <yuzhihong@gmail.com> Closes #10325 from ted-yu/master.