diff options
author | Yanbo Liang <ybliang8@gmail.com> | 2016-11-22 00:05:30 -0800 |
---|---|---|
committer | Yanbo Liang <ybliang8@gmail.com> | 2016-11-22 00:05:30 -0800 |
commit | acb97157796231fef74aba985825b05b607b9279 (patch) | |
tree | 3b3b55f4b1363054db5d020ab492a870e27f9b7a /python/pyspark/sql/column.py | |
parent | ebeb0830a3a4837c7354a0eee667b9f5fad389c5 (diff) | |
download | spark-acb97157796231fef74aba985825b05b607b9279.tar.gz spark-acb97157796231fef74aba985825b05b607b9279.tar.bz2 spark-acb97157796231fef74aba985825b05b607b9279.zip |
[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not download Spark package.
## What changes were proposed in this pull request?
When running SparkR job in yarn-cluster mode, it will download Spark package from apache website which is not necessary.
```
./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R
```
The following is output:
```
Attaching package: ‘SparkR’
The following objects are masked from ‘package:stats’:
cov, filter, lag, na.omit, predict, sd, var, window
The following objects are masked from ‘package:base’:
as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
rank, rbind, sample, startsWith, subset, summary, transform, union
Spark not found in SPARK_HOME:
Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
......
```
There's no ```SPARK_HOME``` in yarn-cluster mode since the R process is in a remote host of the yarn cluster rather than in the client host. The JVM comes up first and the R process then connects to it. So in such cases we should never have to download Spark as Spark is already running.
## How was this patch tested?
Offline test.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #15888 from yanboliang/spark-18444.
Diffstat (limited to 'python/pyspark/sql/column.py')
0 files changed, 0 insertions, 0 deletions