[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package. - spark

diff options

author	Yanbo Liang <ybliang8@gmail.com>	2016-11-22 00:05:30 -0800
committer	Yanbo Liang <ybliang8@gmail.com>	2016-11-22 00:05:30 -0800
commit	acb97157796231fef74aba985825b05b607b9279 (patch)
tree	3b3b55f4b1363054db5d020ab492a870e27f9b7a /python/pyspark/sql/column.py
parent	ebeb0830a3a4837c7354a0eee667b9f5fad389c5 (diff)
download	spark-acb97157796231fef74aba985825b05b607b9279.tar.gz spark-acb97157796231fef74aba985825b05b607b9279.tar.bz2 spark-acb97157796231fef74aba985825b05b607b9279.zip

[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package.

## What changes were proposed in this pull request? When running SparkR job in yarn-cluster mode, it will download Spark package from apache website which is not necessary. ``` ./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R ``` The following is output: ``` Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var, window The following objects are masked from ‘package:base’: as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, transform, union Spark not found in SPARK_HOME: Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... ...... ``` There's no ```SPARK_HOME``` in yarn-cluster mode since the R process is in a remote host of the yarn cluster rather than in the client host. The JVM comes up first and the R process then connects to it. So in such cases we should never have to download Spark as Spark is already running. ## How was this patch tested? Offline test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15888 from yanboliang/spark-18444.

Diffstat (limited to 'python/pyspark/sql/column.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: