| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change aims at speeding up the dev cycle a little bit, by making
sure that all tests behave the same w.r.t. where the code to be tested
is loaded from. Namely, that means that tests don't rely on the assembly
anymore, rather loading all needed classes from the build directories.
The main change is to make sure all build directories (classes and test-classes)
are added to the classpath of child processes when running tests.
YarnClusterSuite required some custom code since the executors are run
differently (i.e. not through the launcher library, like standalone and
Mesos do).
I also found a couple of tests that could leak a SparkContext on failure,
and added code to handle those.
With this patch, it's possible to run the following command from a clean
source directory and have all tests pass:
mvn -Pyarn -Phadoop-2.4 -Phive-thriftserver install
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7629 from vanzin/SPARK-9284.
|
|
|
|
|
|
|
|
|
|
|
|
| |
are cached
Remove obsolete warning about dynamic allocation not working with cached RDDs
See discussion in https://issues.apache.org/jira/browse/SPARK-10295
Author: Sean Owen <sowen@cloudera.com>
Closes #8489 from srowen/SPARK-10295.
|
|
|
|
|
|
|
|
| |
…ion by default
Author: Ram Sriharsha <rsriharsha@hw11853.local>
Closes #8465 from harsha2010/SPARK-10251.
|
|
|
|
|
|
|
|
|
|
|
| |
This PR:
1. supports transferring arbitrary nested array from JVM to R side in SerDe;
2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types
from a DataFrame.
Author: Sun Rui <rui.sun@intel.com>
Closes #8276 from sun-rui/SPARK-10048.
|
|
|
|
|
|
|
|
|
|
|
|
| |
to JavaConverters
Replace `JavaConversions` implicits with `JavaConverters`
Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet.
Author: Sean Owen <sowen@cloudera.com>
Closes #8033 from srowen/SPARK-9613.
|
|
|
|
|
|
| |
Author: ehnalis <zoltan.zvara@gmail.com>
Closes #8308 from ehnalis/master.
|
|
|
|
|
|
| |
Author: Zhang, Liye <liye.zhang@intel.com>
Closes #8412 from liyezhang556520/minorDoc.
|
|
|
|
|
|
|
|
| |
The peak execution memory metric was introduced in SPARK-8735. That was before Tungsten was enabled by default, so it assumed that `spark.sql.unsafe.enabled` must be explicitly set to true. The result is that the memory is not displayed by default.
Author: Andrew Or <andrew@databricks.com>
Closes #8345 from andrewor14/show-memory-default.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-9439
In general, Yarn apps should be robust to NodeManager restarts. However, if you run spark with the external shuffle service on, after a NM restart all shuffles fail, b/c the shuffle service has lost some state with info on each executor. (Note the shuffle data is perfectly fine on disk across a NM restart, the problem is we've lost the small bit of state that lets us *find* those files.)
The solution proposed here is that the external shuffle service can write out its state to leveldb (backed by a local file) every time an executor is added. When running with yarn, that file is in the NM's local dir. Whenever the service is started, it looks for that file, and if it exists, it reads the file and re-registers all executors there.
Nothing is changed in non-yarn modes with this patch. The service is not given a place to save the state to, so it operates the same as before. This should make it easy to update other cluster managers as well, by just supplying the right file & the equivalent of yarn's `initializeApplication` -- I'm not familiar enough with those modes to know how to do that.
Author: Imran Rashid <irashid@cloudera.com>
Closes #7943 from squito/leveldb_external_shuffle_service_NM_restart and squashes the following commits:
0d285d3 [Imran Rashid] review feedback
70951d6 [Imran Rashid] Merge branch 'master' into leveldb_external_shuffle_service_NM_restart
5c71c8c [Imran Rashid] save executor to db before registering; style
2499c8c [Imran Rashid] explicit dependency on jackson-annotations
795d28f [Imran Rashid] review feedback
81f80e2 [Imran Rashid] Merge branch 'master' into leveldb_external_shuffle_service_NM_restart
594d520 [Imran Rashid] use json to serialize application executor info
1a7980b [Imran Rashid] version
8267d2a [Imran Rashid] style
e9f99e8 [Imran Rashid] cleanup the handling of bad dbs a little
9378ba3 [Imran Rashid] fail gracefully on corrupt leveldb files
acedb62 [Imran Rashid] switch to writing out one record per executor
79922b7 [Imran Rashid] rely on yarn to call stopApplication; assorted cleanup
12b6a35 [Imran Rashid] save registered executors when apps are removed; add tests
c878fbe [Imran Rashid] better explanation of shuffle service port handling
694934c [Imran Rashid] only open leveldb connection once per service
d596410 [Imran Rashid] store executor data in leveldb
59800b7 [Imran Rashid] Files.move in case renaming is unsupported
32fe5ae [Imran Rashid] Merge branch 'master' into external_shuffle_service_NM_restart
d7450f0 [Imran Rashid] style
f729e2b [Imran Rashid] debugging
4492835 [Imran Rashid] lol, dont use a PrintWriter b/c of scalastyle checks
0a39b98 [Imran Rashid] Merge branch 'master' into external_shuffle_service_NM_restart
55f49fc [Imran Rashid] make sure the service doesnt die if the registered executor file is corrupt; add tests
245db19 [Imran Rashid] style
62586a6 [Imran Rashid] just serialize the whole executors map
bdbbf0d [Imran Rashid] comments, remove some unnecessary changes
857331a [Imran Rashid] better tests & comments
bb9d1e6 [Imran Rashid] formatting
bdc4b32 [Imran Rashid] rename
86e0cb9 [Imran Rashid] for tests, shuffle service finds an open port
23994ff [Imran Rashid] style
7504de8 [Imran Rashid] style
a36729c [Imran Rashid] cleanup
efb6195 [Imran Rashid] proper unit test, and no longer leak if apps stop during NM restart
dd93dc0 [Imran Rashid] test for shuffle service w/ NM restarts
d596969 [Imran Rashid] cleanup imports
0e9d69b [Imran Rashid] better names
9eae119 [Imran Rashid] cleanup lots of duplication
1136f44 [Imran Rashid] test needs to have an actual shuffle
0b588bd [Imran Rashid] more fixes ...
ad122ef [Imran Rashid] more fixes
5e5a7c3 [Imran Rashid] fix build
c69f46b [Imran Rashid] maybe working version, needs tests & cleanup ...
bb3ba49 [Imran Rashid] minor cleanup
36127d3 [Imran Rashid] wip
b9d2ced [Imran Rashid] incomplete setup for external shuffle service tests
|
|
|
|
|
|
|
|
| |
so constructors parameters and public fields can be annotated. rxin MechCoder
Author: Xiangrui Meng <meng@databricks.com>
Closes #8344 from mengxr/SPARK-10140.2.
|
|
|
|
|
|
| |
Author: Alex Shkurenko <ashkurenko@enova.com>
Closes #8239 from ashkurenko/master.
|
|
|
|
|
|
|
|
|
|
| |
Currently the spark applications can be queued to the Mesos cluster dispatcher, but when multiple jobs are in queue we don't handle removing jobs from the buffer correctly while iterating and causes null pointer exception.
This patch copies the buffer before iterating them, so exceptions aren't thrown when the jobs are removed.
Author: Timothy Chen <tnachen@gmail.com>
Closes #8322 from tnachen/fix_cluster_mode.
|
|
|
|
|
|
|
|
| |
expliticly disabled.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8316 from vanzin/SPARK-10119.
|
|
|
|
|
|
|
|
|
| |
Fix for OOM for graph creation
Author: Joshi <rekhajoshm@gmail.com>
Author: Rekha Joshi <rekhajoshm@gmail.com>
Closes #7602 from rekhajoshm/SPARK-8889.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
complicated
I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type.
### JIRA
[[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8264 from yu-iskw/SPARK-9856-3.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add warnings according to SPARK-8949 in `SparkContext`
- warnings in scaladoc
- log warnings when preferred locations feature is used through `SparkContext`'s constructor
However I didn't found any documentation reference of this feature. Please direct me if you know any reference to this feature.
Author: Han JU <ju.han.felix@gmail.com>
Closes #7874 from darkjh/SPARK-8949.
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark.streaming.backpressure.{enable-->enabled} and fixed deprecated annotations
Small changes
- Renamed conf spark.streaming.backpressure.{enable --> enabled}
- Change Java Deprecated annotations to Scala deprecated annotation with more information.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #8299 from tdas/SPARK-9967.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
accesses cacheLocs
In Scala, `Seq.fill` always seems to return a List. Accessing a list by index is an O(N) operation. Thus, the following code will be really slow (~10 seconds on my machine):
```scala
val numItems = 100000
val s = Seq.fill(numItems)(1)
for (i <- 0 until numItems) s(i)
```
It turns out that we had a loop like this in DAGScheduler code, although it's a little tricky to spot. In `getPreferredLocsInternal`, there's a call to `getCacheLocs(rdd)(partition)`. The `getCacheLocs` call returns a Seq. If this Seq is a List and the RDD contains many partitions, then indexing into this list will cost O(partitions). Thus, when we loop over our tasks to compute their individual preferred locations we implicitly perform an N^2 loop, reducing scheduling throughput.
This patch fixes this by replacing `Seq` with `Array`.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8178 from JoshRosen/dagscheduler-perf.
|
|
|
|
|
|
|
|
|
|
| |
The fix for SPARK-7736 introduced a race where a port value of "-1"
could be passed down to the pyspark process, causing it to fail to
connect back to the JVM. This change adds code to fix that race.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8258 from vanzin/SPARK-7736.
|
|
|
|
|
|
|
|
|
|
| |
it might be a typo introduced at the first moment or some leftover after some renaming......
the name of the method accessing the index file is called `getBlockData` now (not `getBlockLocation` as indicated in the comments)
Author: CodingCat <zhunansjtu@gmail.com>
Closes #8238 from CodingCat/minor_1.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The YARN backend doesn't like when user code calls `System.exit`,
since it cannot know the exit status and thus cannot set an
appropriate final status for the application.
So, for pyspark, avoid that call and instead throw an exception with
the exit code. SparkSubmit handles that exception and exits with
the given exit code, while YARN uses the exit code as the failure
code for the Spark app.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #7751 from vanzin/SPARK-9416.
|
|
|
|
|
|
|
|
| |
already running.
Author: Rohit Agarwal <rohita@qubole.com>
Closes #8153 from mindprince/SPARK-9924.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updates the tachyon-client version to the latest release.
The main difference between 0.7.0 and 0.7.1 on the client side is to support running Tachyon on local file system by default.
No new non-Tachyon dependencies are added, and no code changes are required since the client API has not changed.
Author: Calvin Jia <jia.calvin@gmail.com>
Closes #8235 from calvinjia/spark-9199-master.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The shuffle locality patch made the DAGScheduler aware of shuffle data,
but for RDDs that have both narrow and shuffle dependencies, it can
cause them to place tasks based on the shuffle dependency instead of the
narrow one. This case is common in iterative join-based algorithms like
PageRank and ALS, where one RDD is hash-partitioned and one isn't.
Author: Matei Zaharia <matei@databricks.com>
Closes #8220 from mateiz/shuffle-loc-fix.
|
|
|
|
|
|
|
|
| |
Tiny modification to a few comments ```sbt publishLocal``` work again.
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes #8209 from hvanhovell/SPARK-9980.
|
|
|
|
|
|
| |
Author: Davies Liu <davies@databricks.com>
Closes #8219 from davies/fix_typo.
|
|
|
|
|
|
|
|
| |
Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0.
Author: Reynold Xin <rxin@databricks.com>
Closes #8162 from rxin/SPARK-9934.
|
|
|
|
|
|
|
|
| |
Detailed exception log can be seen in [SPARK-9877](https://issues.apache.org/jira/browse/SPARK-9877), the problem is when creating `StandaloneRestServer`, `self` (`masterEndpoint`) is null. So this fix is creating `StandaloneRestServer` when `self` is available.
Author: jerryshao <sshao@hortonworks.com>
Closes #8127 from jerryshao/SPARK-9877.
|
|
|
|
|
|
|
|
| |
In these tests, we use a custom listener and we assert on fields in the stage / task completion events. However, these events are posted in a separate thread so they're not guaranteed to be posted in time. This commit fixes this flakiness through a job end registration callback.
Author: Andrew Or <andrew@databricks.com>
Closes #8176 from andrewor14/fix-accumulator-suite.
|
|
|
|
|
|
|
|
|
|
|
| |
initialized
When a stage failed and another stage was resubmitted with only part of partitions to compute, all the tasks failed with error message: java.util.NoSuchElementException: key not found: peakExecutionMemory.
This is because the internal accumulators are not properly initialized for this stage while other codes assume the internal accumulators always exist.
Author: Carson Wang <carson.wang@intel.com>
Closes #8090 from carsonwang/SPARK-9809.
|
|
|
|
|
|
|
|
|
|
| |
instead of Long
Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int
Author: Neelesh Srinivas Salian <nsalian@cloudera.com>
Closes #8183 from nssalian/SPARK-9923.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead.
Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16.
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #8175 from davies/page_size.
|
|
|
|
|
|
| |
This particular test did not load the default configurations so
it continued to start the REST server, which causes port bind
exceptions.
|
|
|
|
|
|
|
|
| |
This patch add a thread-safe lookup for BytesToBytseMap, and use that in broadcasted HashedRelation.
Author: Davies Liu <davies@databricks.com>
Closes #8151 from davies/safeLookup.
|
|
|
|
|
|
|
|
| |
I think that we should pass additional configuration flags to disable the driver UI and Master REST server in SparkSubmitSuite and HiveSparkSubmitSuite. This might cut down on port-contention-related flakiness in Jenkins.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8124 from JoshRosen/disable-ui-in-sparksubmitsuite.
|
|
|
|
|
|
|
|
|
|
| |
Author: Rohit Agarwal <rohita@qubole.com>
Closes #8014 from mindprince/SPARK-9724 and squashes the following commits:
a7af5ff [Rohit Agarwal] [SPARK-9724] [WEB UI] Inline attachPrefix and attachPrefixForRedirect. Fix logic of attachPrefix
8a977cd [Rohit Agarwal] [SPARK-9724] [WEB UI] Address review comments: Remove unneeded code, update scaladoc.
b257844 [Rohit Agarwal] [SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor Utils class and create ShutdownHookManager.
NOTE: Wasn't able to run /dev/run-tests on windows machine.
Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark)
ex:
log4j.rootCategory=WARN,console,redis
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.graphx.Pregel=INFO
log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender
log4j.appender.redis.endpoints=hostname:port
log4j.appender.redis.key=mykey
log4j.appender.redis.alwaysBatch=false
log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1
Author: michellemay <mlemay@gmail.com>
Closes #8109 from michellemay/SPARK-9826.
|
|
|
|
|
|
|
|
| |
… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.
Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>
Closes #7657 from neurons/SPARK-9092.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add `Since` as a Scala annotation. The benefit is that we can use it without having explicit JavaDoc. This is useful for inherited methods. The limitation is that is doesn't show up in the generated Java API documentation. This might be fixed by modifying genjavadoc. I think we could leave it as a TODO.
This is how the generated Scala doc looks:
`since` JavaDoc tag:
![screen shot 2015-08-11 at 10 00 37 pm](https://cloud.githubusercontent.com/assets/829644/9230761/fa72865c-40d8-11e5-807e-0f3c815c5acd.png)
`Since` annotation:
![screen shot 2015-08-11 at 10 00 28 pm](https://cloud.githubusercontent.com/assets/829644/9230764/0041d7f4-40d9-11e5-8124-c3f3e5d5b31f.png)
rxin
Author: Xiangrui Meng <meng@databricks.com>
Closes #8131 from mengxr/SPARK-8967.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the sister patch to #8011, but for aggregation.
In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.
rxin yhuai
Author: Andrew Or <andrew@databricks.com>
Closes #8038 from andrewor14/unsafe-starve-memory-agg.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
executor twice
This is based on KaiXinXiaoLei's changes in #7716.
The issue is that when someone calls `sc.killExecutor("1")` on the same executor twice quickly, then the executor target will be adjusted downwards by 2 instead of 1 even though we're only actually killing one executor. In certain cases where we don't adjust the target back upwards quickly, we'll end up with jobs hanging.
This is a common danger because there are many places where this is called:
- `HeartbeatReceiver` kills an executor that has not been sending heartbeats
- `ExecutorAllocationManager` kills an executor that has been idle
- The user code might call this, which may interfere with the previous callers
While it's not clear whether this fixes SPARK-9745, fixing this potential race condition seems like a strict improvement. I've added a regression test to illustrate the issue.
Author: Andrew Or <andrew@databricks.com>
Closes #8078 from andrewor14/da-double-kill.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows clients to retrieve the original exception from the
cause field of the SparkException that is thrown by the driver.
If the original exception is not in fact Serializable then it will
not be returned, but the message and stacktrace will be. (All Java
Throwables implement the Serializable interface, but this is no
guarantee that a particular implementation can actually be
serialized.)
Author: Tom White <tom@cloudera.com>
Closes #7014 from tomwhite/propagate-user-exceptions.
|
|
|
|
|
|
|
|
| |
Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.
Author: Timothy Chen <tnachen@gmail.com>
Closes #7195 from tnachen/mesos_files.
|
|
|
|
|
|
|
|
|
|
| |
To reproduce the issue, go to the stage page and click DAG Visualization once, then go to the job page to show the job DAG visualization. You will only see the first stage of the job.
Root cause: the java script use local storage to remember your selection. Once you click the stage DAG visualization, the local storage set `expand-dag-viz-arrow-stage` to true. When you go to the job page, the js checks `expand-dag-viz-arrow-stage` in the local storage first and will try to show stage DAG visualization on the job page.
To fix this, I set an id to the DAG span to differ job page and stage page. In the js code, we check the id and local storage together to make sure we show the correct DAG visualization.
Author: Carson Wang <carson.wang@intel.com>
Closes #8104 from carsonwang/SPARK-9426.
|
|
|
|
|
|
|
|
|
|
| |
The peak execution memory is not correct because it shows the sum of finished tasks' values when a task finishes.
This PR fixes it by using the update value rather than the accumulator value.
Author: zsxwing <zsxwing@gmail.com>
Closes #8121 from zsxwing/SPARK-9829.
|
|
|
|
|
|
|
|
| |
applications
Author: Rohit Agarwal <rohita@qubole.com>
Closes #8088 from mindprince/SPARK-9806.
|
|
|
|
|
|
|
| |
Author: xutingjun <xutingjun@huawei.com>
Author: meiyoula <1039320815@qq.com>
Closes #6817 from XuTingjun/SPARK-8366.
|
|
|
|
|
|
|
|
|
|
| |
`InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed.
This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak.
Author: zsxwing <zsxwing@gmail.com>
Closes #8108 from zsxwing/internal-accumulators-leak.
|
|
|
|
|
|
|
|
| |
API is updated but its doc comment is not updated.
Author: Jeff Zhang <zjffdu@apache.org>
Closes #8097 from zjffdu/dev.
|
|
|
|
|
|
|
|
|
|
| |
PlatformDependent.UNSAFE is way too verbose.
Author: Reynold Xin <rxin@databricks.com>
Closes #8094 from rxin/SPARK-9815 and squashes the following commits:
229b603 [Reynold Xin] [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.
|