aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorFelix Cheung <felixcheung_m@hotmail.com>2016-12-08 11:29:31 -0800
committerShivaram Venkataraman <shivaram@cs.berkeley.edu>2016-12-08 11:29:31 -0800
commitc3d3a9d0e85b834abef87069e4edd27db87fc607 (patch)
treebcb6a0d4e506d6ddce58cf927e2e6b8e85970045
parent3c68944b229aaaeeaee3efcbae3e3be9a2914855 (diff)
downloadspark-c3d3a9d0e85b834abef87069e4edd27db87fc607.tar.gz
spark-c3d3a9d0e85b834abef87069e4edd27db87fc607.tar.bz2
spark-c3d3a9d0e85b834abef87069e4edd27db87fc607.zip
[SPARK-18590][SPARKR] build R source package when making distribution
## What changes were proposed in this pull request? This PR has 2 key changes. One, we are building source package (aka bundle package) for SparkR which could be released on CRAN. Two, we should include in the official Spark binary distributions SparkR installed from this source package instead (which would have help/vignettes rds needed for those to work when the SparkR package is loaded in R, whereas earlier approach with devtools does not) But, because of various differences in how R performs different tasks, this PR is a fair bit more complicated. More details below. This PR also includes a few minor fixes. ### more details These are the additional steps in make-distribution; please see [here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's going to a CRAN release, which is now run during make-distribution.sh. 1. package needs to be installed because the first code block in vignettes is `library(SparkR)` without lib path 2. `R CMD build` will build vignettes (this process runs Spark/SparkR code and captures outputs into pdf documentation) 3. `R CMD check` on the source package will install package and build vignettes again (this time from source packaged) - this is a key step required to release R package on CRAN (will skip tests here but tests will need to pass for CRAN release process to success - ideally, during release signoff we should install from the R source package and run tests) 4. `R CMD Install` on the source package (this is the only way to generate doc/vignettes rds files correctly, not in step # 1) (the output of this step is what we package into Spark dist and sparkr.zip) Alternatively, R CMD build should already be installing the package in a temp directory though it might just be finding this location and set it to lib.loc parameter; another approach is perhaps we could try calling `R CMD INSTALL --build pkg` instead. But in any case, despite installing the package multiple times this is relatively fast. Building vignettes takes a while though. ## How was this patch tested? Manually, CI. Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16014 from felixcheung/rdist.
-rw-r--r--R/CRAN_RELEASE.md2
-rwxr-xr-xR/check-cran.sh19
-rwxr-xr-xR/install-dev.sh2
-rw-r--r--R/pkg/.Rbuildignore3
-rw-r--r--R/pkg/DESCRIPTION13
-rw-r--r--R/pkg/NAMESPACE2
-rwxr-xr-xdev/create-release/release-build.sh27
-rwxr-xr-xdev/make-distribution.sh25
8 files changed, 74 insertions, 19 deletions
diff --git a/R/CRAN_RELEASE.md b/R/CRAN_RELEASE.md
index bea8f9fbe4..d6084c7a7c 100644
--- a/R/CRAN_RELEASE.md
+++ b/R/CRAN_RELEASE.md
@@ -7,7 +7,7 @@ To release SparkR as a package to CRAN, we would use the `devtools` package. Ple
First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.
-Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.
+Note that while `run-tests.sh` runs `check-cran.sh` (which runs `R CMD check`), it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release. Also note that for CRAN checks for pdf vignettes to success, `qpdf` tool must be there (to install it, eg. `yum -q -y install qpdf`).
To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.
diff --git a/R/check-cran.sh b/R/check-cran.sh
index c5f042848c..1288e7fc9f 100755
--- a/R/check-cran.sh
+++ b/R/check-cran.sh
@@ -34,8 +34,9 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
-echo "USING R_HOME = $R_HOME"
+echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"
+# Install the package (this is required for code in vignettes to run when building it later)
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh
@@ -82,4 +83,20 @@ else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi
+
+# Install source package to get it to generate vignettes rds files, etc.
+if [ -n "$CLEAN_INSTALL" ]
+then
+ echo "Removing lib path and installing from source package"
+ LIB_DIR="$FWDIR/lib"
+ rm -rf $LIB_DIR
+ mkdir -p $LIB_DIR
+ "$R_SCRIPT_PATH/"R CMD INSTALL SparkR_"$VERSION".tar.gz --library=$LIB_DIR
+
+ # Zip the SparkR package so that it can be distributed to worker nodes on YARN
+ pushd $LIB_DIR > /dev/null
+ jar cfM "$LIB_DIR/sparkr.zip" SparkR
+ popd > /dev/null
+fi
+
popd > /dev/null
diff --git a/R/install-dev.sh b/R/install-dev.sh
index ada6303a72..0f881208bc 100755
--- a/R/install-dev.sh
+++ b/R/install-dev.sh
@@ -46,7 +46,7 @@ if [ ! -z "$R_HOME" ]
fi
R_SCRIPT_PATH="$(dirname $(which R))"
fi
-echo "USING R_HOME = $R_HOME"
+echo "Using R_SCRIPT_PATH = ${R_SCRIPT_PATH}"
# Generate Rd files if devtools is installed
"$R_SCRIPT_PATH/"Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtools); devtools::document(pkg="./pkg", roclets=c("rd")) }'
diff --git a/R/pkg/.Rbuildignore b/R/pkg/.Rbuildignore
index 544d203a6d..f12f8c275a 100644
--- a/R/pkg/.Rbuildignore
+++ b/R/pkg/.Rbuildignore
@@ -1,5 +1,8 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.lintr$
+^cran-comments\.md$
+^NEWS\.md$
+^README\.Rmd$
^src-native$
^html$
diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 981ae12464..0cb3a80a6e 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,8 +1,8 @@
Package: SparkR
Type: Package
-Title: R Frontend for Apache Spark
Version: 2.1.0
-Date: 2016-11-06
+Title: R Frontend for Apache Spark
+Description: The SparkR package provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "shivaram@cs.berkeley.edu"),
person("Xiangrui", "Meng", role = "aut",
@@ -10,19 +10,18 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
person("Felix", "Cheung", role = "aut",
email = "felixcheung@apache.org"),
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
+License: Apache License (== 2.0)
URL: http://www.apache.org/ http://spark.apache.org/
BugReports: http://spark.apache.org/contributing.html
Depends:
R (>= 3.0),
methods
Suggests:
+ knitr,
+ rmarkdown,
testthat,
e1071,
- survival,
- knitr,
- rmarkdown
-Description: The SparkR package provides an R frontend for Apache Spark.
-License: Apache License (== 2.0)
+ survival
Collate:
'schema.R'
'generics.R'
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index daee09de88..377f9429ae 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -3,7 +3,7 @@
importFrom("methods", "setGeneric", "setMethod", "setOldClass")
importFrom("methods", "is", "new", "signature", "show")
importFrom("stats", "gaussian", "setNames")
-importFrom("utils", "download.file", "object.size", "packageVersion", "untar")
+importFrom("utils", "download.file", "object.size", "packageVersion", "tail", "untar")
# Disable native libraries till we figure out how to package it
# See SPARKR-7839
diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh
index aa42750f26..8863ee6cd7 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -150,7 +150,7 @@ if [[ "$1" == "package" ]]; then
NAME=$1
FLAGS=$2
ZINC_PORT=$3
- BUILD_PIP_PACKAGE=$4
+ BUILD_PACKAGE=$4
cp -r spark spark-$SPARK_VERSION-bin-$NAME
cd spark-$SPARK_VERSION-bin-$NAME
@@ -172,11 +172,30 @@ if [[ "$1" == "package" ]]; then
MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'`
- if [ -z "$BUILD_PIP_PACKAGE" ]; then
- echo "Creating distribution without PIP package"
+ if [ -z "$BUILD_PACKAGE" ]; then
+ echo "Creating distribution without PIP/R package"
./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz $FLAGS \
-DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log
cd ..
+ elif [[ "$BUILD_PACKAGE" == "withr" ]]; then
+ echo "Creating distribution with R package"
+ ./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --r $FLAGS \
+ -DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log
+ cd ..
+
+ echo "Copying and signing R source package"
+ R_DIST_NAME=SparkR_$SPARK_VERSION.tar.gz
+ cp spark-$SPARK_VERSION-bin-$NAME/R/$R_DIST_NAME .
+
+ echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour \
+ --output $R_DIST_NAME.asc \
+ --detach-sig $R_DIST_NAME
+ echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \
+ MD5 $R_DIST_NAME > \
+ $R_DIST_NAME.md5
+ echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \
+ SHA512 $R_DIST_NAME > \
+ $R_DIST_NAME.sha
else
echo "Creating distribution with PIP package"
./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz --pip $FLAGS \
@@ -222,7 +241,7 @@ if [[ "$1" == "package" ]]; then
make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" &
make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" "withpip" &
make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 -Pyarn -Pmesos" "3037" &
- make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" &
+ make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" "withr" &
wait
rm -rf spark-$SPARK_VERSION-bin-*/
diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index 49b46fbc3f..fe281bbaa2 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -34,6 +34,7 @@ DISTDIR="$SPARK_HOME/dist"
MAKE_TGZ=false
MAKE_PIP=false
+MAKE_R=false
NAME=none
MVN="$SPARK_HOME/build/mvn"
@@ -41,7 +42,7 @@ function exit_with_usage {
echo "make-distribution.sh - tool for making binary distributions of Spark"
echo ""
echo "usage:"
- cl_options="[--name] [--tgz] [--pip] [--mvn <mvn-command>]"
+ cl_options="[--name] [--tgz] [--pip] [--r] [--mvn <mvn-command>]"
echo "make-distribution.sh $cl_options <maven build options>"
echo "See Spark's \"Building Spark\" doc for correct Maven options."
echo ""
@@ -71,6 +72,9 @@ while (( "$#" )); do
--pip)
MAKE_PIP=true
;;
+ --r)
+ MAKE_R=true
+ ;;
--mvn)
MVN="$2"
shift
@@ -208,11 +212,24 @@ cp -r "$SPARK_HOME/data" "$DISTDIR"
# Make pip package
if [ "$MAKE_PIP" == "true" ]; then
echo "Building python distribution package"
- cd $SPARK_HOME/python
+ pushd "$SPARK_HOME/python" > /dev/null
python setup.py sdist
- cd ..
+ popd > /dev/null
+else
+ echo "Skipping building python distribution package"
+fi
+
+# Make R package - this is used for both CRAN release and packing R layout into distribution
+if [ "$MAKE_R" == "true" ]; then
+ echo "Building R source package"
+ pushd "$SPARK_HOME/R" > /dev/null
+ # Build source package and run full checks
+ # Install source package to get it to generate vignettes, etc.
+ # Do not source the check-cran.sh - it should be run from where it is for it to set SPARK_HOME
+ NO_TESTS=1 CLEAN_INSTALL=1 "$SPARK_HOME/"R/check-cran.sh
+ popd > /dev/null
else
- echo "Skipping creating pip installable PySpark"
+ echo "Skipping building R source package"
fi
# Copy other things