diff options
author | Jeff Zhang <zjffdu@apache.org> | 2016-09-23 11:37:43 -0700 |
---|---|---|
committer | Felix Cheung <felixcheung@apache.org> | 2016-09-23 11:37:43 -0700 |
commit | f62ddc5983a08d4d54c0a9a8210dd6cbec555671 (patch) | |
tree | 5da554890f9ddf38d58decdf4c68247e4bbba24b | |
parent | f89808b0fdbc04e1bdff1489a6ec4c84ddb2adc4 (diff) | |
download | spark-f62ddc5983a08d4d54c0a9a8210dd6cbec555671.tar.gz spark-f62ddc5983a08d4d54c0a9a8210dd6cbec555671.tar.bz2 spark-f62ddc5983a08d4d54c0a9a8210dd6cbec555671.zip |
[SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio
## What changes were proposed in this pull request?
Spark will add sparkr.zip to archive only when it is yarn mode (SparkSubmit.scala).
```
if (args.isR && clusterManager == YARN) {
val sparkRPackagePath = RUtils.localSparkRPackagePath
if (sparkRPackagePath.isEmpty) {
printErrorAndExit("SPARK_HOME does not exist for R application in YARN mode.")
}
val sparkRPackageFile = new File(sparkRPackagePath.get, SPARKR_PACKAGE_ARCHIVE)
if (!sparkRPackageFile.exists()) {
printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R application in YARN mode.")
}
val sparkRPackageURI = Utils.resolveURI(sparkRPackageFile.getAbsolutePath).toString
// Distribute the SparkR package.
// Assigns a symbol link name "sparkr" to the shipped package.
args.archives = mergeFileLists(args.archives, sparkRPackageURI + "#sparkr")
// Distribute the R package archive containing all the built R packages.
if (!RUtils.rPackages.isEmpty) {
val rPackageFile =
RPackageUtils.zipRLibraries(new File(RUtils.rPackages.get), R_PACKAGE_ARCHIVE)
if (!rPackageFile.exists()) {
printErrorAndExit("Failed to zip all the built R packages.")
}
val rPackageURI = Utils.resolveURI(rPackageFile.getAbsolutePath).toString
// Assigns a symbol link name "rpkg" to the shipped package.
args.archives = mergeFileLists(args.archives, rPackageURI + "#rpkg")
}
}
```
So it is necessary to pass spark.master from R process to JVM. Otherwise sparkr.zip won't be distributed to executor. Besides that I also pass spark.yarn.keytab/spark.yarn.principal to spark side, because JVM process need them to access secured cluster.
## How was this patch tested?
Verify it manually in R Studio using the following code.
```
Sys.setenv(SPARK_HOME="/Users/jzhang/github/spark")
.libPaths(c(file.path(Sys.getenv(), "R", "lib"), .libPaths()))
library(SparkR)
sparkR.session(master="yarn-client", sparkConfig = list(spark.executor.instances="1"))
df <- as.DataFrame(mtcars)
head(df)
```
…
Author: Jeff Zhang <zjffdu@apache.org>
Closes #14784 from zjffdu/SPARK-17210.
-rw-r--r-- | R/pkg/R/sparkR.R | 4 | ||||
-rw-r--r-- | docs/sparkr.md | 15 |
2 files changed, 19 insertions, 0 deletions
diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R index 06015362e6..cc6d591bb2 100644 --- a/R/pkg/R/sparkR.R +++ b/R/pkg/R/sparkR.R @@ -491,6 +491,10 @@ sparkConfToSubmitOps[["spark.driver.memory"]] <- "--driver-memory" sparkConfToSubmitOps[["spark.driver.extraClassPath"]] <- "--driver-class-path" sparkConfToSubmitOps[["spark.driver.extraJavaOptions"]] <- "--driver-java-options" sparkConfToSubmitOps[["spark.driver.extraLibraryPath"]] <- "--driver-library-path" +sparkConfToSubmitOps[["spark.master"]] <- "--master" +sparkConfToSubmitOps[["spark.yarn.keytab"]] <- "--keytab" +sparkConfToSubmitOps[["spark.yarn.principal"]] <- "--principal" + # Utility function that returns Spark Submit arguments as a string # diff --git a/docs/sparkr.md b/docs/sparkr.md index b881119731..340e7f7cb1 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -63,6 +63,21 @@ The following Spark driver properties can be set in `sparkConfig` with `sparkR.s <table class="table"> <tr><th>Property Name</th><th>Property group</th><th><code>spark-submit</code> equivalent</th></tr> <tr> + <td><code>spark.master</code></td> + <td>Application Properties</td> + <td><code>--master</code></td> + </tr> + <tr> + <td><code>spark.yarn.keytab</code></td> + <td>Application Properties</td> + <td><code>--keytab</code></td> + </tr> + <tr> + <td><code>spark.yarn.principal</code></td> + <td>Application Properties</td> + <td><code>--principal</code></td> + </tr> + <tr> <td><code>spark.driver.memory</code></td> <td>Application Properties</td> <td><code>--driver-memory</code></td> |