| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds better balancing when performing a repartition of an
RDD. Previously the elements in the RDD were hash partitioned, meaning
if the RDD was skewed certain partitions would end up being very large.
This commit adds load balancing of elements across the repartitioned
RDD splits. The load balancing is not perfect: a given output partition
can have up to N more elements than the average if there are N input
partitions. However, some randomization is used to minimize the
probabiliy that this happens.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #727 from pwendell/load-balance and squashes the following commits:
f9da752 [Patrick Wendell] Response to Matei's feedback
acfa46a [Patrick Wendell] SPARK-1770: Load balance elements when repartitioning.
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #728 from witgo/scala_home and squashes the following commits:
cdfd8be [witgo] Merge branch 'master' of https://github.com/apache/spark into scala_home
fac094a [witgo] remove outdated runtime Information scala home
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SparkSubmit ignores `--jars` for YARN client. This is a bug.
This PR also automatically adds the application jar to `spark.jar`. Previously, when running as yarn-client, you must specify the jar additionally through `--files` (because `--jars` didn't work). Now you don't have to explicitly specify it through either.
Tested on a YARN cluster.
Author: Andrew Or <andrewor14@gmail.com>
Closes #710 from andrewor14/yarn-jars and squashes the following commits:
35d1928 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
c27bf6c [Andrew Or] For yarn-cluster and python, do not add primaryResource to spark.jar
c92c5bf [Andrew Or] Minor cleanups
269f9f3 [Andrew Or] Fix format
013d840 [Andrew Or] Fix tests
1407474 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-jars
3bb75e8 [Andrew Or] Allow SparkSubmit --jars to take effect in yarn-client mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
failure
TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure.
I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?)
velvia notes:
"I have found a workaround. If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty."
There are at least 3 versions of Netty in play in the build:
- the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem
- the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6.
- but, Spark Core directly uses io.netty:netty-all:4.0.17.Final
The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue.
The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final.
But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile.
If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation.
So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict:
- Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts
- Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty
- Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent
- Update SBT build accordingly
A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible.
Author: Sean Owen <sowen@cloudera.com>
Closes #723 from srowen/SPARK-1789 and squashes the following commits:
43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues
|
|
|
|
|
|
|
|
|
|
|
| |
Tolerate empty strings in PythonRDD
Author: Kan Zhang <kzhang@apache.org>
Closes #644 from kanzhang/SPARK-1690 and squashes the following commits:
c62ad33 [Kan Zhang] Adding Python doctest
473ec4b [Kan Zhang] [SPARK-1690] Tolerating empty elements when saving Python RDD to text files
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes https://issues.apache.org/jira/browse/SPARK-1731 by adding the Python includes to the PYTHONPATH before depickling the broadcast values
@airhorns
Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
Closes #656 from bouk/python-includes-before-broadcast and squashes the following commits:
7b0dfe4 [Bouke van der Bijl] Add Python includes to path before depickling broadcast values
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer and types that depend on it (Broadcast and AccumulableCollection). Putting these in the public API signatures now will allow us to use Scala Pickling for serialization down the line without breaking binary compatibility.
One question remaining is whether we also want them on Accumulator -- Accumulator is passed as part of a bigger Task or TaskResult object via the closure serializer so it doesn't seem super useful to add the ClassTag there. Broadcast and AccumulableCollection in contrast were being serialized directly.
CC @rxin, @pwendell, @heathermiller
Author: Matei Zaharia <matei@databricks.com>
Closes #700 from mateiz/spark-1708 and squashes the following commits:
1a3d8b0 [Matei Zaharia] Use fake ClassTag in Java
3b449ed [Matei Zaharia] test fix
2209a27 [Matei Zaharia] Code style fixes
9d48830 [Matei Zaharia] Add a ClassTag on Serializer and things that depend on it
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-1686
moved from original JIRA (by @markhamstra):
In deploy.master.Master, the completeRecovery method is the last thing to be called when a standalone Master is recovering from failure. It is responsible for resetting some state, relaunching drivers, and eventually resuming its scheduling duties.
There are currently four places in Master.scala where completeRecovery is called. Three of them are from within the actor's receive method, and aren't problems. The last starts from within receive when the ElectedLeader message is received, but the actual completeRecovery() call is made from the Akka scheduler. That means that it will execute on a different scheduler thread, and Master itself will end up running (i.e., schedule() ) from that Akka scheduler thread.
In this PR, I added a new master message TriggerSchedule to trigger the "local" call of schedule() in the scheduler thread
Author: CodingCat <zhunansjtu@gmail.com>
Closes #639 from CodingCat/SPARK-1686 and squashes the following commits:
81bb4ca [CodingCat] rename variable
69e0a2a [CodingCat] style fix
36a2ac0 [CodingCat] address Aaron's comments
ec9b7bb [CodingCat] address the comments
02b37ca [CodingCat] keep schedule() calling in the main thread
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looks like this change was accidentally committed here: https://github.com/apache/spark/commit/06b15baab25951d124bbe6b64906f4139e037deb
but the change does not show up in the PR itself (#704).
Other than not intending to go in with that PR, this also broke the test JavaAPISuite.repartition.
Author: Aaron Davidson <aaron@databricks.com>
Closes #716 from aarondav/shufflerand and squashes the following commits:
b1cf70b [Aaron Davidson] SPARK-1770: Revert accidental(?) fix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Removing a block through the blockmanager gave a scary warning messages in the driver.
```
2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true
2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true
2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: true
```
This is because the [BlockManagerSlaveActor](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala#L44) would send back an acknowledgement ("true"). But the BlockManagerMasterActor would have sent the RemoveBlock message as a send, not as ask(), so would reject the receiver "true" as a unknown message.
@pwendell
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #708 from tdas/bm-fix and squashes the following commits:
ed4ef15 [Tathagata Das] Converted bang to ask to avoid scary warning when a block is removed.
|
|
|
|
| |
Meant to do this when patching up the last merge.
|
|
|
|
|
|
|
|
|
|
| |
This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.
Author: Sandeep <sandeep@techaddict.me>
Closes #707 from techaddict/SPARK-1775 and squashes the following commits:
18d8ebf [Sandeep] SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark when task descriptions are large, e.g. due to large lineage graphs or local variables.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gives a nicely formatted message to the user when `run-example` is run to
tell them to use `spark-submit`.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #704 from pwendell/examples and squashes the following commits:
1996ee8 [Patrick Wendell] Feedback form Andrew
3eb7803 [Patrick Wendell] Suggestions from TD
2474668 [Patrick Wendell] SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now, SparkSubmit ignores the `--name` flag for both yarn-client and yarn-cluster. This is a bug.
In client mode, SparkSubmit treats `--name` as a [cluster config](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L170) and does not propagate this to SparkContext.
In cluster mode, SparkSubmit passes this flag to `org.apache.spark.deploy.yarn.Client`, which only uses it for the [YARN ResourceManager](https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L80), but does not propagate this to SparkContext.
This PR ensures that `spark.app.name` is always set if SparkSubmit receives the `--name` flag, which is what the usage promises. This makes it possible for applications to start a SparkContext with an empty conf `val sc = new SparkContext(new SparkConf)`, and inherit the app name from SparkSubmit.
Tested both modes on a YARN cluster.
Author: Andrew Or <andrewor14@gmail.com>
Closes #699 from andrewor14/yarn-app-name and squashes the following commits:
98f6a79 [Andrew Or] Fix tests
dea932f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-app-name
c86d9ca [Andrew Or] Respect SparkSubmit --name on YARN
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It makes little sense to start a TaskContext that is interrupted. Indeed, I searched for all use cases of it and didn't find a single instance in which `interrupted` is true on construction.
This was inspired by reviewing #640, which adds an additional `@volatile var completed` that is similar. These are not the most urgent changes, but I wanted to push them out before I forget.
Author: Andrew Or <andrewor14@gmail.com>
Closes #675 from andrewor14/task-context and squashes the following commits:
9575e02 [Andrew Or] Add space
69455d1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into task-context
c471490 [Andrew Or] Oops, removed one flag too many. Adding it back.
85311f8 [Andrew Or] Move interrupted flag from TaskContext constructor
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ?
Also few other things that did not work like
`bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2`
Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully.
Author: Prashant Sharma <prashant.s@imaginea.com>
Closes #552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits:
669dd23 [Prashant Sharma] Review comments
2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When at least one of the following conditions is true, PySpark cannot be loaded:
1. PYTHONPATH is not set
2. PYTHONPATH does not contain the python directory (or jar, in the case of YARN)
3. The jar does not contain pyspark files (YARN)
4. The jar does not contain py4j files (YARN)
However, we currently throw the same random `java.io.EOFException` for all of the above cases, when trying to read from the python daemon's output. This message is super unhelpful.
This PR includes the python stderr and the PYTHONPATH in the exception propagated to the driver. Now, the exception message looks something like:
```
Error from python worker:
: No module named pyspark
PYTHONPATH was:
/path/to/spark/python:/path/to/some/jar
java.io.EOFException
<stack trace>
```
whereas before it was just
```
java.io.EOFException
<stack trace>
```
Author: Andrew Or <andrewor14@gmail.com>
Closes #603 from andrewor14/pyspark-exception and squashes the following commits:
10d65d3 [Andrew Or] Throwable -> Exception, worker -> daemon
862d1d7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
a5ed798 [Andrew Or] Use block string and interpolation instead of var (minor)
cc09c45 [Andrew Or] Account for the fact that the python daemon may not have terminated yet
444f019 [Andrew Or] Use the new RedirectThread + include system PYTHONPATH
aab00ae [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
0cc2402 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
783efe2 [Andrew Or] Make python daemon stderr indentation consistent
9524172 [Andrew Or] Avoid potential NPE / error stream contention + Move things around
29f9688 [Andrew Or] Add back original exception type
e92d36b [Andrew Or] Include python worker stderr in the exception propagated to the driver
7c69360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-exception
cdbc185 [Andrew Or] Fix python attribute not found exception when PYTHONPATH is not set
dcc0353 [Andrew Or] Check both python and system environment variables for PYTHONPATH
6c09c21 [Andrew Or] Validate PYTHONPATH and PySpark modules before starting python workers
|
|
|
|
|
|
|
|
|
|
| |
Happy to open a jira ticket if you'd like to track one there.
Author: Andrew Ash <andrew@andrewash.com>
Closes #678 from ash211/SecurityManagerLogging and squashes the following commits:
2aa0b7a [Andrew Ash] Nicer logging for SecurityManager startup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch includes several cleanups to PythonRDD, focused around fixing [SPARK-1579](https://issues.apache.org/jira/browse/SPARK-1579) cleanly. Listed in order of approximate importance:
- The Python daemon waits for Spark to close the socket before exiting,
in order to avoid causing spurious IOExceptions in Spark's
`PythonRDD::WriterThread`.
- Removes the Python Monitor Thread, which polled for task cancellations
in order to kill the Python worker. Instead, we do this in the
onCompleteCallback, since this is guaranteed to be called during
cancellation.
- Adds a "completed" variable to TaskContext to avoid the issue noted in
[SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), where onCompleteCallbacks may be execution-order dependent.
Along with this, I removed the "context.interrupted = true" flag in
the onCompleteCallback.
- Extracts PythonRDD::WriterThread to its own class.
Since this patch provides an alternative solution to [SPARK-1019](https://issues.apache.org/jira/browse/SPARK-1019), I did test it with
```
sc.textFile("latlon.tsv").take(5)
```
many times without error.
Additionally, in order to test the unswallowed exceptions, I performed
```
sc.textFile("s3n://<big file>").count()
```
and cut my internet during execution. Prior to this patch, we got the "stdin writer exited early" message, which was unhelpful. Now, we get the SocketExceptions propagated through Spark to the user and get proper (though unsuccessful) task retries.
Author: Aaron Davidson <aaron@databricks.com>
Closes #640 from aarondav/pyspark-io and squashes the following commits:
b391ff8 [Aaron Davidson] Detect "clean socket shutdowns" and stop waiting on the socket
c0c49da [Aaron Davidson] SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions
|
|
|
|
|
|
|
|
|
|
|
|
| |
... that do not change schema
Author: Kan Zhang <kzhang@apache.org>
Closes #448 from kanzhang/SPARK-1460 and squashes the following commits:
111e388 [Kan Zhang] silence MiMa errors in EdgeRDD and VertexRDD
91dc787 [Kan Zhang] Taking into account newly added Ordering param
79ed52a [Kan Zhang] [SPARK-1460] Returning SchemaRDD on Set operations that do not change schema
|
|
|
|
|
|
|
|
|
|
| |
compatibility
Author: Patrick Wendell <pwendell@gmail.com>
Closes #676 from pwendell/worker-opts and squashes the following commits:
54456c4 [Patrick Wendell] SPARK-1746: Support setting SPARK_JAVA_OPTS on executors for backwards compatibility
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR updates spark-submit to allow submitting Python scripts (currently only with deploy-mode=client, but that's all that was supported before) and updates the PySpark code to properly find various paths, etc. One significant change is that we assume we can always find the Python files either from the Spark assembly JAR (which will happen with the Maven assembly build in make-distribution.sh) or from SPARK_HOME (which will exist in local mode even if you use sbt assembly, and should be enough for testing). This means we no longer need a weird hack to modify the environment for YARN.
This patch also updates the Python worker manager to run python with -u, which means unbuffered output (send it to our logs right away instead of waiting a while after stuff was written); this should simplify debugging.
In addition, it fixes https://issues.apache.org/jira/browse/SPARK-1709, setting the main class from a JAR's Main-Class attribute if not specified by the user, and fixes a few help strings and style issues in spark-submit.
In the future we may want to make the `pyspark` shell use spark-submit as well, but it seems unnecessary for 1.0.
Author: Matei Zaharia <matei@databricks.com>
Closes #664 from mateiz/py-submit and squashes the following commits:
15e9669 [Matei Zaharia] Fix some uses of path.separator property
051278c [Matei Zaharia] Small style fixes
0afe886 [Matei Zaharia] Add license headers
4650412 [Matei Zaharia] Add pyFiles to PYTHONPATH in executors, remove old YARN stuff, add tests
15f8e1e [Matei Zaharia] Set PYTHONPATH in PythonWorkerFactory in case it wasn't set from outside
47c0655 [Matei Zaharia] More work to make spark-submit work with Python:
d4375bd [Matei Zaharia] Clean up description of spark-submit args a bit and add Python ones
|
|
|
|
|
|
|
|
|
|
| |
... java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory
Author: witgo <witgo@qq.com>
Closes #665 from witgo/SPARK-1734 and squashes the following commits:
cacf238 [witgo] SPARK-1734: spark-submit throws an exception: Exception in thread "main" java.lang.ClassNotFoundException: org.apache.spark.broadcast.TorrentBroadcastFactory
|
|
|
|
|
|
|
|
|
|
|
| |
See https://issues.apache.org/jira/browse/SPARK-1685 for a more complete description, but in essence: If the Worker or AppClient actor restarts before successfully registering with Master, multiple retryTimers will be running, which will lead to less than the full number of registration retries being attempted before the new actor is forced to give up.
Author: Mark Hamstra <markhamstra@gmail.com>
Closes #602 from markhamstra/SPARK-1685 and squashes the following commits:
11cc088 [Mark Hamstra] retryTimer -> registrationRetryTimer
69c348c [Mark Hamstra] Cancel retryTimer on restart of Worker or AppClient
|
|
|
|
|
|
|
|
|
|
| |
Modify wrong comment of function addWithoutResize.
Author: ArcherShao <ArcherShao@users.noreply.github.com>
Closes #667 from ArcherShao/patch-3 and squashes the following commits:
a607358 [ArcherShao] Update OpenHashSet.scala
|
|
|
|
|
|
|
|
|
|
|
| |
Hopefully this can go into 1.0, as a few people on the user list have asked for this.
Author: Andrew Or <andrewor14@gmail.com>
Closes #648 from andrewor14/expose-listeners and squashes the following commits:
e45e1ef [Andrew Or] Add missing colons (minor)
350d643 [Andrew Or] Expose SparkListeners and relevant classes as DeveloperApi
|
|
|
|
|
|
|
|
| |
Author: Sandy Ryza <sandy@cloudera.com>
Closes #657 from sryza/sandy-spark-1728 and squashes the following commits:
4751443 [Sandy Ryza] SPARK-1728. JavaRDDLike.mapPartitionsWithIndex requires ClassTag
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is because Mesos calls it with a different environment or something, the result is that the Spark jar is missing and it can't load classes.
This fixes http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html
I have no idea whether this is the right fix, I can only confirm that it fixes the issue for us.
The `registered` method is called from mesos (https://github.com/apache/mesos/blob/765ff9bc2ac5a12d4362f8235b572a37d646390a/src/java/jni/org_apache_mesos_MesosExecutorDriver.cpp)
I am unsure which commit caused this regression
Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
Closes #620 from bouk/mesos-classloader-fix and squashes the following commits:
c13eae0 [Bouke van der Bijl] Use getContextOrSparkClassLoader in SparkEnv and CompressionCodec
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"InvocationTargetException"
Catching the InvocationTargetException, printing getTargetException.
Author: Sandeep <sandeep@techaddict.me>
Closes #630 from techaddict/SPARK-1710 and squashes the following commits:
834d79b [Sandeep] changes from srowen suggestions
109d604 [Sandeep] SPARK-1710: spark-submit should print better errors than "InvocationTargetException"
|
|
|
|
|
|
|
|
|
|
|
|
| |
it's used in ReplSuite, and return to use lang3 utility in Utils.scala
For consideration. This was proposed in related discussion: https://github.com/apache/spark/pull/569
Author: Sean Owen <sowen@cloudera.com>
Closes #635 from srowen/SPARK-1629.2 and squashes the following commits:
a442b98 [Sean Owen] Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we indicated disconnected(), which keeps the application in a limbo state where it has no executors but thinks it will get them soon.
This is a bug fix that hopefully can be included in 1.0.
Author: Aaron Davidson <aaron@databricks.com>
Closes #605 from aarondav/appremoved and squashes the following commits:
bea02a2 [Aaron Davidson] SPARK-1689 AppClient should indicate app is dead() when removed
|
|
|
|
|
|
|
|
|
|
| |
Should lookup `shutdownDeleteTachyonPaths` instead of `shutdownDeletePaths`. Together with a minor style clean up: `find {...}.isDefined` to `exists {...}`.
Author: Cheng Lian <lian.cs.zju@gmail.com>
Closes #575 from liancheng/tachyonFix and squashes the following commits:
deb8f31 [Cheng Lian] Fixed logical error in when cleanup Tachyon files and minor style cleanup
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.
Note this hasn't been fully tested yet. Need to test in standalone mode.
Putting this up for people to look at and possibly test. I don't have access to a mesos cluster.
This is alternative to https://github.com/apache/spark/pull/607
Author: Thomas Graves <tgraves@apache.org>
Closes #621 from tgravescs/SPARK-1676 and squashes the following commits:
244d55a [Thomas Graves] fix line length
44163d4 [Thomas Graves] Rework
9398853 [Thomas Graves] change to have doAs in executor higher up.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will ensure that sockets do not build up over the course of a job, and that cancellation successfully cleans up sockets.
Tested in standalone mode. More file descriptors spawn than expected (around 1000ish rather than the expected 8ish) but they do not pile up between runs, or as high as before (where they went up to around 5k).
Author: Aaron Davidson <aaron@databricks.com>
Closes #623 from aarondav/pyspark2 and squashes the following commits:
0ca13bb [Aaron Davidson] SPARK-1700: Close socket file descriptors on task completion
|
|
|
|
|
|
|
|
| |
Author: wangfei <wangfei_hello@126.com>
Closes #613 from scwf/masterIndex and squashes the following commits:
1463056 [wangfei] delete no use var: masterIndex
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modifications to Spark core are limited to exposing functionality to test files + minor style fixes.
(728 / 769 lines are from tests)
Author: Andrew Or <andrewor14@gmail.com>
Closes #591 from andrewor14/event-log-tests and squashes the following commits:
2883837 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests
c3afcea [Andrew Or] Compromise
2d5daf8 [Andrew Or] Use temp directory provided by the OS rather than /tmp
2b52151 [Andrew Or] Remove unnecessary file delete + add a comment
62010fd [Andrew Or] More cleanup (renaming variables, updating comments etc)
ad2beff [Andrew Or] Clean up EventLoggingListenerSuite + modify a few comments
862e752 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests
e0ba2f8 [Andrew Or] Fix test failures caused by race condition in processing/mutating events
b990453 [Andrew Or] ReplayListenerBus suite - tests do not all pass yet
ab66a84 [Andrew Or] Tests for FileLogger + delete file after tests
187bb25 [Andrew Or] Formatting and renaming variables
769336f [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-tests
5d38ffe [Andrew Or] Clean up EventLoggingListenerSuite + add comments
e12f4b1 [Andrew Or] Preliminary tests for EventLoggingListener (need major cleanup)
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #581 from witgo/SPARK-1659 and squashes the following commits:
0b2cf98 [witgo] Delete spark-submit obsolete usage: "--arg ARG"
|
|
|
|
|
|
|
|
| |
Author: wangfei <wangfei_hello@126.com>
Closes #614 from scwf/pxcw and squashes the following commits:
d1016ba [wangfei] fix spelling mistake
|
|
|
|
|
|
|
|
|
|
|
|
| |
...OS_WINDOWS`
Author: witgo <witgo@qq.com>
Closes #569 from witgo/SPARK-1629 and squashes the following commits:
31520eb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1629
fcaafd7 [witgo] merge mastet
49e248e [witgo] Fix SPARK-1629: Spark should inline use of commons-lang `SystemUtils.IS_OS_WINDOWS`
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo
Author: Sandy Ryza <sandy@cloudera.com>
Closes #30 from sryza/sandy-spark-1004 and squashes the following commits:
89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time
5165a02 [Sandy Ryza] Fix docs
fd0df79 [Sandy Ryza] PySpark on YARN
|
|
|
|
|
|
|
|
|
|
|
|
| |
In XORShiftRandom.scala, use val "million" instead of constant "1e6.toInt".
Delete vals that never used in other files.
Author: WangTao <barneystinson@aliyun.com>
Closes #565 from WangTaoTheTonic/master and squashes the following commits:
17cacfc [WangTao] Handle the unused assignment, method parameters and symbol inspected by Intellij IDEA
37b4090 [WangTao] Handle the vals that never used
|
|
|
|
|
|
|
|
|
|
| |
Args for worker rather than master
Author: Chen Chao <crazyjvm@gmail.com>
Closes #587 from CrazyJvm/patch-6 and squashes the following commits:
b54b89f [Chen Chao] Args for worker rather than master
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #423 from witgo/zipWithIndex and squashes the following commits:
039ec04 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex
24d74c9 [witgo] review commit
763a5e4 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex
59747d1 [witgo] review commit
7bf4d06 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex
daa8f84 [witgo] review commit
4070613 [witgo] Merge branch 'master' of https://github.com/apache/spark into zipWithIndex
18e6c97 [witgo] java api zipWithIndex test
11e2e7f [witgo] add zipWithIndex zipWithUniqueId methods to java api
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds minimal setting of event log directory/files permissions. To have a secure environment the user must manually create the top level event log directory and set permissions up. We can add logic to do that automatically later if we want.
Author: Thomas Graves <tgraves@apache.org>
Closes #538 from tgravescs/SPARK-1557 and squashes the following commits:
e471d8e [Thomas Graves] rework
d8b6620 [Thomas Graves] update use of octal
3ca9b79 [Thomas Graves] Updated based on comments
5a09709 [Thomas Graves] add in missing import
3150ed6 [Thomas Graves] SPARK-1557 Set permissions on event log files/directories
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a straightforward fix.
Author: Patrick Wendell <pwendell@gmail.com>
This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>
Closes #578 from pwendell/spark-submit-yarn and squashes the following commits:
96027c7 [Patrick Wendell] Test fixes
b5be173 [Patrick Wendell] Review feedback
4ac9cac [Patrick Wendell] SPARK-1652: spark-submit for yarn prints warnings even though calling as expected
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deals with two issues:
1. Spark shell didn't correctly pass quoted arguments to spark-submit.
```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
which triggered warnings. Now we use new, more narrowly scoped,
variables.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #576 from pwendell/spark-submit and squashes the following commits:
67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
|
|
|
|
|
|
|
|
|
| |
Author: Patrick Wendell <pwendell@gmail.com>
Closes #579 from pwendell/spark-submit-yarn-2 and squashes the following commits:
05e1b11 [Patrick Wendell] Small fix
d2a40ad [Patrick Wendell] SPARK-1652: Spark submit should fail gracefully if YARN support not enabled
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
contains multiple Java options
Author: witgo <witgo@qq.com>
Closes #547 from witgo/SPARK-1609 and squashes the following commits:
deb6a4c [witgo] review commit
91da0bb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609
0640852 [witgo] review commit
8f90b22 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609
bcf36cb [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1609
1185605 [witgo] fix extraJavaOptions split
f7c0ab7 [witgo] bugfix
86fc4bb [witgo] bugfix
8a265b7 [witgo] Fix SPARK-1609: Executor fails to start when use spark-submit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
failures
This includes some minor code clean-up as well. The main change is that small files are not memory mapped. There is a nicer way to write that code block using Scala's `Try` but to make it easy to back port and as simple as possible, I opted for the more explicit but less pretty format.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #43 from pwendell/block-iter-logging and squashes the following commits:
1cff512 [Patrick Wendell] Small issue from merge.
49f6c269 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into block-iter-logging
4943351 [Patrick Wendell] Added a test and feedback on mateis review
a637a18 [Patrick Wendell] Review feedback and adding rewind() when reading byte buffers.
b76b95f [Patrick Wendell] Review feedback
4e1514e [Patrick Wendell] Don't memory map for small files
d238b88 [Patrick Wendell] Some logging and clean-up
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This modifies spark-submit to do something more like the Hadoop `jar`
command. Now we have the following syntax:
./bin/spark-submit [options] user.jar [user options]
Author: Patrick Wendell <pwendell@gmail.com>
Closes #563 from pwendell/spark-submit and squashes the following commits:
32241fc [Patrick Wendell] Review feedback
3adfb69 [Patrick Wendell] Small fix
bc48139 [Patrick Wendell] SPARK-1606: Infer user application arguments instead of requiring --arg.
|