From 2462dbcce89d657bca17ae311c99c2a4bee4a5fa Mon Sep 17 00:00:00 2001 From: Sun Rui Date: Fri, 23 Oct 2015 21:38:04 -0700 Subject: [SPARK-10971][SPARKR] RRunner should allow setting path to Rscript. Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes. The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395). BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host. For your information, PYSPARK has two environment variables serving simliar purpose: PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`). PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script. Author: Sun Rui Closes #9179 from sun-rui/SPARK-10971. --- docs/configuration.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'docs/configuration.md') diff --git a/docs/configuration.md b/docs/configuration.md index be9c36bdfe..682384d424 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1589,6 +1589,20 @@ Apart from these, the following properties are also available, and may be useful Number of threads used by RBackend to handle RPC calls from SparkR package. + + spark.r.command + Rscript + + Executable for executing R scripts in cluster modes for both driver and workers. + + + + spark.r.driver.command + spark.r.command + + Executable for executing R scripts in client modes for driver. Ignored in cluster modes. + + #### Cluster Managers @@ -1628,6 +1642,10 @@ The following variables can be set in `spark-env.sh`: PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). + + SPARKR_DRIVER_R + R binary executable to use for SparkR shell (default is R). + SPARK_LOCAL_IP IP address of the machine to bind to. -- cgit v1.2.3