| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
#511 and #863 got left out of branch-1.0 since we were really close to the release. Now that they have been tested a little I see no reason to leave them out.
Author: Michael Armbrust <michael@databricks.com>
Author: witgo <witgo@qq.com>
Closes #1078 from marmbrus/branch-1.0 and squashes the following commits:
22be674 [witgo] [SPARK-1841]: update scalatest to version 2.1.5
fc8fc79 [Michael Armbrust] Include #1071 as well.
c5d0adf [Michael Armbrust] Update SparkSQL in branch-1.0 to match master.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: witgo <witgo@qq.com>
Closes #786 from witgo/maven_plugin and squashes the following commits:
5de86a2 [witgo] Merge branch 'master' of https://github.com/apache/spark into maven_plugin
c35ef73 [witgo] Improve maven plugin configuration
Conflicts:
pom.xml
|
| |
|
| |
|
|
|
|
| |
This reverts commit 2f1dc868e5714882cf40d2633fb66772baf34789.
|
|
|
|
| |
This reverts commit 832dc594e7666f1d402334f8015ce29917d9c888.
|
| |
|
| |
|
|
|
|
| |
This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
|
|
|
|
| |
This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If I run the following on a YARN cluster
```
bin/spark-submit sheep.py --master yarn-client
```
it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
```
bin/spark-submit file:/path/to/sheep.py --master yarn-client
```
However, this also fails. This time it is because python does not understand URI schemes.
This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.
Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.
Author: Andrew Or <andrewor14@gmail.com>
Closes #853 from andrewor14/submit-paths and squashes the following commits:
0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
3c36587 [Andrew Or] Improve error messages (minor)
854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
3bb0359 [Andrew Or] Update more comments (minor)
2a1f8a0 [Andrew Or] Update comments (minor)
6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
a68c4d1 [Andrew Or] Handle Windows python file path correctly
427a250 [Andrew Or] Resolve paths properly for Windows
a591a4a [Andrew Or] Update tests for resolving URIs
6c8621c [Andrew Or] Move resolveURIs to Utils
db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
f542dce [Andrew Or] Fix outdated tests
691c4ce [Andrew Or] Ignore special primary resource names
5342ac7 [Andrew Or] Add missing space in error message
02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
(cherry picked from commit 5081a0a9d47ca31900ea4de570de2cbb0e063105)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hierarchy for configuring the Spark master in the shell is as follows:
```
MASTER > --master > spark.master (spark-defaults.conf)
```
This is inconsistent with the way we run normal applications, which is:
```
--master > spark.master (spark-defaults.conf) > MASTER
```
I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark).
Author: Andrew Or <andrewor14@gmail.com>
Closes #846 from andrewor14/shell-master and squashes the following commits:
2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL
(cherry picked from commit cce77457e00aa5f1f4db3d50454cf257efb156ed)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark), we explicitly tell the users to add the jars this way.
Author: Andrew Or <andrewor14@gmail.com>
Closes #849 from andrewor14/shell-jars and squashes the following commits:
928a7e6 [Andrew Or] ',' -> "," (minor)
afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit
c6da113 [Andrew Or] Do not set spark.jars to ""
d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell
(cherry picked from commit 8edbee7d1b4afc192d97ba192a5526affc464205)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
| |
|
| |
|
|
|
|
| |
This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
|
|
|
|
| |
This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
|
|
|
|
| |
This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
|
|
|
|
| |
This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82.
|
|
|
|
| |
This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 18f062303303824139998e8fc8f4158217b0dbc3.
|
|
|
|
| |
This reverts commit d08e9604fc9958b7c768e91715c8152db2ed6fd0.
|
| |
|
| |
|
|
|
|
| |
This reverts commit 3d0a44833ab50360bf9feccc861cb5e8c44a4866.
|
|
|
|
| |
This reverts commit 9772d85c6f3893d42044f4bab0e16f8b6287613a.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent.
Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former.
The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules.
Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method.
_If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._
Author: Sean Owen <sowen@cloudera.com>
Closes #732 from srowen/SPARK-1798 and squashes the following commits:
5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each
b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean
bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests
(cherry picked from commit 7120a2979d0a9f0f54a88b2416be7ca10e74f409)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR updates spark-submit to allow submitting Python scripts (currently only with deploy-mode=client, but that's all that was supported before) and updates the PySpark code to properly find various paths, etc. One significant change is that we assume we can always find the Python files either from the Spark assembly JAR (which will happen with the Maven assembly build in make-distribution.sh) or from SPARK_HOME (which will exist in local mode even if you use sbt assembly, and should be enough for testing). This means we no longer need a weird hack to modify the environment for YARN.
This patch also updates the Python worker manager to run python with -u, which means unbuffered output (send it to our logs right away instead of waiting a while after stuff was written); this should simplify debugging.
In addition, it fixes https://issues.apache.org/jira/browse/SPARK-1709, setting the main class from a JAR's Main-Class attribute if not specified by the user, and fixes a few help strings and style issues in spark-submit.
In the future we may want to make the `pyspark` shell use spark-submit as well, but it seems unnecessary for 1.0.
Author: Matei Zaharia <matei@databricks.com>
Closes #664 from mateiz/py-submit and squashes the following commits:
15e9669 [Matei Zaharia] Fix some uses of path.separator property
051278c [Matei Zaharia] Small style fixes
0afe886 [Matei Zaharia] Add license headers
4650412 [Matei Zaharia] Add pyFiles to PYTHONPATH in executors, remove old YARN stuff, add tests
15f8e1e [Matei Zaharia] Set PYTHONPATH in PythonWorkerFactory in case it wasn't set from outside
47c0655 [Matei Zaharia] More work to make spark-submit work with Python:
d4375bd [Matei Zaharia] Clean up description of spark-submit args a bit and add Python ones
(cherry picked from commit 951a5d939863b42da83ac2569d5e9d7ed680e119)
Signed-off-by: Matei Zaharia <matei@databricks.com>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x
2, Fix SPARK-1491: maven hadoop-provided profile fails to build
3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency
4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces)
Author: witgo <witgo@qq.com>
Closes #480 from witgo/format_pom and squashes the following commits:
03f652f [witgo] review commit
b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence
7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence
0da4bc3 [witgo] merge master
d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
e345919 [witgo] add avro dependency to yarn-alpha
77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency
1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom
934f24d [witgo] review commit
cf46edc [witgo] exclude jruby
06e7328 [witgo] Merge branch 'SparkBuild' into format_pom
99464d2 [witgo] fix maven hadoop-provided profile fails to build
0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x
6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml
(cherry picked from commit 030f2c2126d5075576cd6d83a1ee7462c48b953b)
Conflicts:
sql/catalyst/pom.xml
sql/core/pom.xml
sql/hive/pom.xml
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This simplifies the shell a bunch and passes all arguments through to spark-submit.
There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.
Author: Patrick Wendell <pwendell@gmail.com>
Closes #542 from pwendell/spark-shell and squashes the following commits:
9eb3e6f [Patrick Wendell] Updating Spark docs
b552459 [Patrick Wendell] Andrew's feedback
97720fa [Patrick Wendell] Review feedback
aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
(cherry picked from commit dc3b640a0ab3501b678b591be3e99fbcf3badbec)
Signed-off-by: Patrick Wendell <pwendell@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.
Author: Mridul Muralidharan <mridulm80@apache.org>
This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>
Closes #505 from mridulm/windows_fixes and squashes the following commits:
ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows
(cherry picked from commit 968c0187a12f5ae4a696c02c1ff088e998ed7edd)
Signed-off-by: Matei Zaharia <matei@databricks.com>
|