aboutsummaryrefslogtreecommitdiff
path: root/core
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-12701][CORE] FileAppender should use join to ensure writing thread ↵Bryan Cutler2016-01-081-10/+1
| | | | | | | | | | completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.
* [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 editionSean Owen2016-01-081-0/+1
| | | | | | | | Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
* [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for KryoShixiong Zhu2016-01-072-7/+37
| | | | | | | | The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10609 from zsxwing/SPARK-12591.
* [SPARK-12604][CORE] Addendum - use casting vs mapValues for countBy{Key,Value}Sean Owen2016-01-072-2/+2
| | | | | | | | Per rxin, let's use the casting for countByKey and countByValue as well. Let's see if this passes. Author: Sean Owen <sowen@cloudera.com> Closes #10641 from srowen/SPARK-12604.2.
* [SPARK-12598][CORE] bug in setMinPartitionsDarek Blasiak2016-01-071-3/+2
| | | | | | | | There is a bug in the calculation of ```maxSplitSize```. The ```totalLen``` should be divided by ```minPartitions``` and not by ```files.size```. Author: Darek Blasiak <darek.blasiak@640labs.com> Closes #10546 from datafarmer/setminpartitionsbug.
* [STREAMING][MINOR] More contextual information in logs + minor code i…Jacek Laskowski2016-01-073-4/+4
| | | | | | | | | | …mprovements Please review and merge at your convenience. Thanks! Author: Jacek Laskowski <jacek@japila.pl> Closes #10595 from jaceklaskowski/streaming-minor-fixes.
* [SPARK-12295] [SQL] external spilling for window functionsDavies Liu2016-01-065-8/+48
| | | | | | | | | | This PR manage the memory used by window functions (buffered rows), also enable external spilling. After this PR, we can run window functions on a partition with hundreds of millions of rows with only 1G. Author: Davies Liu <davies@databricks.com> Closes #10605 from davies/unsafe_window.
* [SPARK-12678][CORE] MapPartitionsRDD clearDependenciesGuillaume Poulin2016-01-061-1/+6
| | | | | | | | | MapPartitionsRDD was keeping a reference to `prev` after a call to `clearDependencies` which could lead to memory leak. Author: Guillaume Poulin <poulin.guillaume@gmail.com> Closes #10623 from gpoulin/map_partition_deps.
* [SPARK-12673][UI] Add missing uri prepending for job descriptionjerryshao2016-01-061-3/+3
| | | | | | | | | | Otherwise the url will be failed to proxy to the right one if in YARN mode. Here is the screenshot: ![screen shot 2016-01-06 at 5 28 26 pm](https://cloud.githubusercontent.com/assets/850797/12139632/bbe78ecc-b49c-11e5-8932-94e8b3622a09.png) Author: jerryshao <sshao@hortonworks.com> Closes #10618 from jerryshao/SPARK-12673.
* [SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0Josh Rosen2016-01-068-554/+36
| | | | | | | | | | | | This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. Author: Josh Rosen <joshrosen@databricks.com> Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
* [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet ↵Nong Li2016-01-061-0/+120
| | | | | | | | | | | | | | | scan benchmarks. [SPARK-12640][SQL] Add simple benchmarking utility class and add Parquet scan benchmarks. We've run benchmarks ad hoc to measure the scanner performance. We will continue to invest in this and it makes sense to get these benchmarks into code. This adds a simple benchmarking utility to do this. Author: Nong Li <nong@databricks.com> Author: Nong <nongli@gmail.com> Closes #10589 from nongli/spark-12640.
* [SPARK-12604][CORE] Java count(AprroxDistinct)ByKey methods return Scala ↵Sean Owen2016-01-063-30/+36
| | | | | | | | | | Long not Java Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change Author: Sean Owen <sowen@cloudera.com> Closes #10554 from srowen/SPARK-12604.
* [SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and ↵Kousuke Saruta2016-01-062-204/+0
| | | | | | | | | | GraphKryoRegistrator which are deprecated and no longer used Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala are no longer used so it's time to remove them in Spark 2.0. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10613 from sarutak/SPARK-12665.
* [SPARK-12340][SQL] fix Int overflow in the SparkPlan.executeTake, RDD.take ↵QiangCai2016-01-062-10/+10
| | | | | | | | | | | | | and AsyncRDDActions.takeAsync I have closed pull request https://github.com/apache/spark/pull/10487. And I create this pull request to resolve the problem. spark jira https://issues.apache.org/jira/browse/SPARK-12340 Author: QiangCai <david.caiq@gmail.com> Closes #10562 from QiangCai/bugfix.
* [SPARK-3873][TESTS] Import ordering fixes.Marcelo Vanzin2016-01-0582-196/+176
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
* [SPARK-3873][CORE] Import ordering fixes.Marcelo Vanzin2016-01-05158-250/+246
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10578 from vanzin/SPARK-3873-core.
* [SPARK-12659] fix NPE in UnsafeExternalSorter (used by cartesian product)Davies Liu2016-01-053-11/+44
| | | | | | | | | | | | Cartesian product use UnsafeExternalSorter without comparator to do spilling, it will NPE if spilling happens. This bug also hitted by #10605 cc JoshRosen Author: Davies Liu <davies@databricks.com> Closes #10606 from davies/fix_spilling.
* [SPARK-12615] Remove some deprecated APIs in RDD/SparkContextReynold Xin2016-01-0517-624/+3
| | | | | | | | I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
* [SPARK-12641] Remove unused code related to Hadoop 0.23Kousuke Saruta2016-01-051-10/+3
| | | | | | | | Currently we don't support Hadoop 0.23 but there is a few code related to it so let's clean it up. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10590 from sarutak/SPARK-12641.
* [SPARK-12486] Worker should kill the executors more forcefully if possible.Nong Li2016-01-043-12/+112
| | | | | | | | | | | | | | This patch updates the ExecutorRunner's terminate path to use the new java 8 API to terminate processes more forcefully if possible. If the executor is unhealthy, it would previously ignore the destroy() call. Presumably, the new java API was added to handle cases like this. We could update the termination path in the future to use OS specific commands for older java versions. Author: Nong Li <nong@databricks.com> Closes #10438 from nongli/spark-12486-executors.
* [SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵Sean Owen2016-01-0224-260/+78
| | | | | | | | | | and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
* [SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from ↵Shixiong Zhu2015-12-3127-1113/+74
| | | | | | | | | | | | | | | | | | | | | setupEndpointRef ### Remove AkkaRpcEnv Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API. ### Remove systemName There are 2 places using `systemName`: * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`. * `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`. ### Remove RpcEnv.uriOf `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10459 from zsxwing/remove-akka-rpc-env.
* [SPARK-12561] Remove JobLogger in Spark 2.0.Reynold Xin2015-12-301-277/+0
| | | | | | | | It was research code and has been deprecated since 1.0.0. No one really uses it since they can just use event logging. Author: Reynold Xin <rxin@databricks.com> Closes #10530 from rxin/SPARK-12561.
* [SPARK-12588] Remove HttpBroadcast in Spark 2.0.Reynold Xin2015-12-308-457/+13
| | | | | | | | We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.
* [SPARK-12399] Display correct error message when accessing REST API with an ↵Carson Wang2015-12-301-2/+14
| | | | | | | | | | | | | | | | | | | | | | | unknown app Id I got an exception when accessing the below REST API with an unknown application Id. `http://<server-url>:18080/api/v1/applications/xxx/jobs` Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx` ``` org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263) at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116) at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226) at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46) at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) ``` Author: Carson Wang <carson.wang@intel.com> Closes #10352 from carsonwang/unknownAppFix.
* [SPARK-12263][DOCS] IllegalStateException: Memory can't be 0 for ↵Neelesh Srinivas Salian2015-12-301-1/+1
| | | | | | | | | | | SPARK_WORKER_MEMORY without unit Updated the Worker Unit IllegalStateException message to indicate no values less than 1MB instead of 0 to help solve this. Requesting review Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #10483 from nssalian/SPARK-12263.
* [SPARK-12490][CORE] Limit the css style scope to fix the Streaming UIShixiong Zhu2015-12-293-3/+5
| | | | | | | | | | | | #10441 broke the Streaming UI because of the new CSS style. <img width="503" alt="screen shot 2015-12-29 at 4 49 04 pm" src="https://cloud.githubusercontent.com/assets/1000778/12044763/1efce0fe-ae4c-11e5-9f8b-39df08426bf8.png"> This PR just added a class for the new style and only applied them to the paged tables. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10517 from zsxwing/fix-streaming-ui.
* [SPARK-12490] Don't use Javascript for web UI's paginated table controlsJosh Rosen2015-12-285-97/+178
| | | | | | | | | | The web UI's paginated table uses Javascript to implement certain navigation controls, such as table sorting and the "go to page" form. This is unnecessary and should be simplified to use plain HTML form controls and links. /cc zsxwing, who wrote this original code, and yhuai. Author: Josh Rosen <joshrosen@databricks.com> Closes #10441 from JoshRosen/simplify-paginated-table-sorting.
* [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugsShixiong Zhu2015-12-281-2/+1
| | | | | | | | | | | | Include the following changes: 1. Close `java.sql.Statement` 2. Fix incorrect `asInstanceOf`. 3. Remove unnecessary `synchronized` and `ReentrantLock`. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10440 from zsxwing/findbugs.
* [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw ↵Daoyuan Wang2015-12-291-6/+1
| | | | | | | | | | | | Buffer underflow exception Since we only need to implement `def skipBytes(n: Int)`, code in #10213 could be simplified. davies scwf Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #10253 from adrian-wang/kryo.
* [SPARK-12517] add default RDD name for one created via sc.textFileYaron Weinsberg2015-12-292-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The feature was first added at commit: 7b877b27053bfb7092e250e01a3b887e1b50a109 but was later removed (probably by mistake) at commit: fc8b58195afa67fbb75b4c8303e022f703cbf007. This change sets the default path of RDDs created via sc.textFile(...) to the path argument. Here is the symptom: * Using spark-1.5.2-bin-hadoop2.6: scala> sc.textFile("/home/root/.bashrc").name res5: String = null scala> sc.binaryFiles("/home/root/.bashrc").name res6: String = /home/root/.bashrc * while using Spark 1.3.1: scala> sc.textFile("/home/root/.bashrc").name res0: String = /home/root/.bashrc scala> sc.binaryFiles("/home/root/.bashrc").name res1: String = /home/root/.bashrc Author: Yaron Weinsberg <wyaron@gmail.com> Author: yaron <yaron@il.ibm.com> Closes #10456 from wyaron/master.
* [SPARK-12396][CORE] Modify the function scheduleAtFixedRate to schedule.echo2mei2015-12-251-2/+2
| | | | | | | | | Instead of just cancel the registrationRetryTimer to avoid driver retry connect to master, change the function to schedule. It is no need to register to master iteratively. Author: echo2mei <534384876@qq.com> Closes #10447 from echoTomei/master.
* [SPARK-12440][CORE] Avoid setCheckpoint warning when directory is not localpierre-borckmans2015-12-241-2/+3
| | | | | | | | | | | | | | | | In SparkContext method `setCheckpointDir`, a warning is issued when spark master is not local and the passed directory for the checkpoint dir appears to be local. In practice, when relying on HDFS configuration file and using a relative path for the checkpoint directory (using an incomplete URI without HDFS scheme, ...), this warning should not be issued and might be confusing. In fact, in this case, the checkpoint directory is successfully created, and the checkpointing mechanism works as expected. This PR uses the `FileSystem` instance created with the given directory, and checks whether it is local or not. (The rationale is that since this same `FileSystem` instance is used to create the checkpoint dir anyway and can therefore be reliably used to determine if it is local or not). The warning is only issued if the directory is not local, on top of the existing conditions. Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com> Closes #10392 from pierre-borckmans/SPARK-12440_CheckpointDir_Warning_NonLocal.
* [SPARK-12311][CORE] Restore previous value of "os.arch" property in test ↵Kazuaki Ishizaki2015-12-2430-81/+183
| | | | | | | | | | | | suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch property after each test Since some of tests forced to set the specific value to os.arch property, we need to set the original value. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10289 from kiszk/SPARK-12311.
* [SPARK-12500][CORE] Fix Tachyon deprecations; pull Tachyon dependency into ↵Sean Owen2015-12-233-84/+104
| | | | | | | | | | | | one class Fix Tachyon deprecations; pull Tachyon dependency into `TachyonBlockManager` only CC calvinjia as I probably need a double-check that the usage of the new API is correct. Author: Sean Owen <sowen@cloudera.com> Closes #10449 from srowen/SPARK-12500.
* [SPARK-12471][CORE] Spark daemons will log their pid on start up.Nong Li2015-12-228-20/+34
| | | | | | Author: Nong Li <nong@databricks.com> Closes #10422 from nongli/12471-pids.
* Minor corrections, i.e. typo fixes and follow deprecatedJacek Laskowski2015-12-225-6/+6
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #10432 from jaceklaskowski/minor-corrections.
* [SPARK-11807] Remove support for Hadoop < 2.2Reynold Xin2015-12-212-24/+3
| | | | | | | | i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
* [SPARK-12388] change default compression to lz4Davies Liu2015-12-213-11/+272
| | | | | | | | | | | | | | According the benchmark [1], LZ4-java could be 80% (or 30%) faster than Snappy. After changing the compressor to LZ4, I saw 20% improvement on end-to-end time for a TPCDS query (Q4). [1] https://github.com/ning/jvm-compressor-benchmark/wiki cc rxin Author: Davies Liu <davies@databricks.com> Closes #10342 from davies/lz4.
* [SPARK-12466] Fix harmless NPE in testsAndrew Or2015-12-211-1/+5
| | | | | | | | | | | | | | | | | | | ``` [info] ReplayListenerSuite: [info] - Simple replay (58 milliseconds) java.lang.NullPointerException at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982) at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980) ``` https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests). Tested locally to verify that the NPE is gone. Author: Andrew Or <andrew@databricks.com> Closes #10417 from andrewor14/fix-harmless-npe.
* [SPARK-2331] SparkContext.emptyRDD should return RDD[T] not EmptyRDD[T]Reynold Xin2015-12-211-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10394 from rxin/SPARK-2331.
* [SPARK-12392][CORE] Optimize a location order of broadcast blocks by ↵Takeshi YAMAMURO2015-12-212-2/+29
| | | | | | | | | | considering preferred local hosts When multiple workers exist in a host, we can bypass unnecessary remote access for broadcasts; block managers fetch broadcast blocks from the same host instead of remote hosts. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10346 from maropu/OptimizeBlockLocationOrder.
* [SPARK-12374][SPARK-12150][SQL] Adding logical/physical operators for Rangegatorsmile2015-12-211-1/+1
| | | | | | | | | | | | | | Based on the suggestions from marmbrus , added logical/physical operators for Range for improving the performance. Also added another API for resolving the JIRA Spark-12150. Could you take a look at my implementation, marmbrus ? If not good, I can rework it. : ) Thank you very much! Author: gatorsmile <gatorsmile@gmail.com> Closes #10335 from gatorsmile/rangeOperators.
* [SPARK-11808] Remove Bagel.Reynold Xin2015-12-192-2/+2
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-192-2/+2
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* Revert "[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs ↵Andrew Or2015-12-182-7/+2
| | | | | | with Mesos cluster mode." This reverts commit ad8c1f0b840284d05da737fb2cc5ebf8848f4490.
* Revert "[SPARK-12345][MESOS] Properly filter out SPARK_HOME in the Mesos ↵Andrew Or2015-12-181-1/+1
| | | | | | REST server" This reverts commit 8184568810e8a2e7d5371db2c6a0366ef4841f70.
* Revert "[SPARK-12413] Fix Mesos ZK persistence"Andrew Or2015-12-181-5/+1
| | | | This reverts commit 2bebaa39d9da33bc93ef682959cd42c1968a6a3e.
* [SPARK-12345][CORE] Do not send SPARK_HOME through Spark submit REST interfaceLuc Bourlier2015-12-181-2/+4
| | | | | | | | | | | | It is usually an invalid location on the remote machine executing the job. It is picked up by the Mesos support in cluster mode, and most of the time causes the job to fail. Fixes SPARK-12345 Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #10329 from skyluc/issue/SPARK_HOME.
* [SPARK-11097][CORE] Add channelActive callback to RpcHandler to monitor the ↵Shixiong Zhu2015-12-184-77/+96
| | | | | | | | | | new connections Added `channelActive` to `RpcHandler` so that `NettyRpcHandler` doesn't need `clients` any more. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10301 from zsxwing/network-events.