[SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio - spark

diff options

author	Jeff Zhang <zjffdu@apache.org>	2016-09-23 11:37:43 -0700
committer	Felix Cheung <felixcheung@apache.org>	2016-09-23 11:37:43 -0700
commit	f62ddc5983a08d4d54c0a9a8210dd6cbec555671 (patch)
tree	5da554890f9ddf38d58decdf4c68247e4bbba24b /repl/scala-2.10/src/main/scala
parent	f89808b0fdbc04e1bdff1489a6ec4c84ddb2adc4 (diff)
download	spark-f62ddc5983a08d4d54c0a9a8210dd6cbec555671.tar.gz spark-f62ddc5983a08d4d54c0a9a8210dd6cbec555671.tar.bz2 spark-f62ddc5983a08d4d54c0a9a8210dd6cbec555671.zip

[SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio

## What changes were proposed in this pull request? Spark will add sparkr.zip to archive only when it is yarn mode (SparkSubmit.scala). ``` if (args.isR && clusterManager == YARN) { val sparkRPackagePath = RUtils.localSparkRPackagePath if (sparkRPackagePath.isEmpty) { printErrorAndExit("SPARK_HOME does not exist for R application in YARN mode.") } val sparkRPackageFile = new File(sparkRPackagePath.get, SPARKR_PACKAGE_ARCHIVE) if (!sparkRPackageFile.exists()) { printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R application in YARN mode.") } val sparkRPackageURI = Utils.resolveURI(sparkRPackageFile.getAbsolutePath).toString // Distribute the SparkR package. // Assigns a symbol link name "sparkr" to the shipped package. args.archives = mergeFileLists(args.archives, sparkRPackageURI + "#sparkr") // Distribute the R package archive containing all the built R packages. if (!RUtils.rPackages.isEmpty) { val rPackageFile = RPackageUtils.zipRLibraries(new File(RUtils.rPackages.get), R_PACKAGE_ARCHIVE) if (!rPackageFile.exists()) { printErrorAndExit("Failed to zip all the built R packages.") } val rPackageURI = Utils.resolveURI(rPackageFile.getAbsolutePath).toString // Assigns a symbol link name "rpkg" to the shipped package. args.archives = mergeFileLists(args.archives, rPackageURI + "#rpkg") } } ``` So it is necessary to pass spark.master from R process to JVM. Otherwise sparkr.zip won't be distributed to executor. Besides that I also pass spark.yarn.keytab/spark.yarn.principal to spark side, because JVM process need them to access secured cluster. ## How was this patch tested? Verify it manually in R Studio using the following code. ``` Sys.setenv(SPARK_HOME="/Users/jzhang/github/spark") .libPaths(c(file.path(Sys.getenv(), "R", "lib"), .libPaths())) library(SparkR) sparkR.session(master="yarn-client", sparkConfig = list(spark.executor.instances="1")) df <- as.DataFrame(mtcars) head(df) ``` … Author: Jeff Zhang <zjffdu@apache.org> Closes #14784 from zjffdu/SPARK-17210.

Diffstat (limited to 'repl/scala-2.10/src/main/scala')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: