The GRAN repository system and reproducibility tools

Gabriel Becker, Dinakar Kulkarni

04 February, 2020

Introduction

GRANBase is an open source set of tools for testing and deploying R packages as package repositories for both general deployment and result reproduction. It is based on the switchr framework, and allows users to deploy package manifests as validated repositories.It is centered around the R repository mechanism for pacakge distribution. GRANBase provides three major areas of functionality:

Creating GRANBase repositories

GRANBase relies on the GRANCore framework for repository management, which in turn is based on package manifests (PkgManifest or SeedingManifest objects from the switchr framework).

Given a manifest, initial construction and rebuilding of individual GRANBase repositories (referred to as subrepositories because GRANBase supports a form of branched deployment) is performed via the makeRepo function. For example:

testpkgs <- list.files(system.file("testpkgs", package = "GRANBase"),
                       full.names = TRUE)
man <- PkgManifest(name = basename(testpkgs),
                  url = testpkgs, type = "local")
repdir <- file.path(tempdir(), "repos")
if(!file.exists(repdir)) dir.create(repdir)
repo <- makeRepo(man,
                repo_name= "stable",
                basedir = repdir,
                destination = repdir,
                cores = 1L,
                install_test = TRUE,
                check_test = FALSE)

NOTE: In the above code, we disabled the installation and R CMD check-related tests due to not playing well with the CRAN build system. In most cases, these should be TRUE in order to create a validated package repository. Also note that in the output below, the willfail package appears in the repository. This would not be the case if the check test was turned on, as it is engineered as a test case to fail check.

available.packages(repo, type="source")
#>            Package      Version  Priority
#> GRANBase   "GRANBase"   "2.6.19" NA      
#> GRANCore   "GRANCore"   "0.2.6"  NA      
#> GRANstable "GRANstable" "0.10.0" NA      
#> deptest    "deptest"    "1.0"    NA      
#> switchr    "switchr"    "0.14.1" NA      
#> toyp       "toyp"       "1.0"    NA      
#> toypkg     "toypkg"     "1.0"    NA      
#>            Depends                                     
#> GRANBase   "GRANCore, switchr (>= 0.13.4), methods"    
#> GRANCore   "R (>= 3.1.0), switchr (>= 0.9.28), methods"
#> GRANstable "GRANCore"                                  
#> deptest    "toypkg"                                    
#> switchr    "methods"                                   
#> toyp       NA                                          
#> toypkg     NA                                          
#>            Imports                                                                                                        
#> GRANBase   "tools, utils, htmlTable (>= 1.11.0), dplyr, sendmailR, covr,\nRCurl, jsonlite, stringi, stats, markdown, desc"
#> GRANCore   NA                                                                                                             
#> GRANstable NA                                                                                                             
#> deptest    NA                                                                                                             
#> switchr    "tools, RJSONIO, RCurl"                                                                                        
#> toyp       NA                                                                                                             
#> toypkg     NA                                                                                                             
#>            LinkingTo Suggests                                             
#> GRANBase   NA        "parallel, rmarkdown, hexSticker, knitr, DT (>= 0.2)"
#> GRANCore   NA        NA                                                   
#> GRANstable NA        NA                                                   
#> deptest    NA        NA                                                   
#> switchr    NA        NA                                                   
#> toyp       NA        NA                                                   
#> toypkg     NA        NA                                                   
#>            Enhances License        License_is_FOSS License_restricts_use
#> GRANBase   NA       "Artistic-2.0" NA              NA                   
#> GRANCore   NA       "Artistic-2.0" NA              NA                   
#> GRANstable NA       "Artistic-2.0" NA              NA                   
#> deptest    NA       "Artistic-2.0" NA              NA                   
#> switchr    NA       "Artistic-2.0" NA              NA                   
#> toyp       NA       "Artistic-2.0" NA              NA                   
#> toypkg     NA       "Artistic-2.0" NA              NA                   
#>            OS_type Archs MD5sum                            
#> GRANBase   NA      NA    "593efb233c1f2aff37aa9acb22bed944"
#> GRANCore   NA      NA    "50d07fd01d9cdd1475cd6630d2eb3fcb"
#> GRANstable NA      NA    "e741036d4d9c8f263775d6cfd7f106fc"
#> deptest    NA      NA    "2f8d8daf7a2d812fc19ad3a0e68f0471"
#> switchr    NA      NA    "9990925259a3d0e29b87a447b3906e0d"
#> toyp       NA      NA    "9b884923688483153538c7e722a9111f"
#> toypkg     NA      NA    "e6512421fcf88ed9d8189c37e9ee7deb"
#>            NeedsCompilation File
#> GRANBase   "no"             NA  
#> GRANCore   "no"             NA  
#> GRANstable "no"             NA  
#> deptest    "no"             NA  
#> switchr    "no"             NA  
#> toyp       "no"             NA  
#> toypkg     "no"             NA  
#>            Repository                                                                                   
#> GRANBase   "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"
#> GRANCore   "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"
#> GRANstable "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"
#> deptest    "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"
#> switchr    "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"
#> toyp       "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"
#> toypkg     "file:///var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/src/contrib"

Note that the repository contains the package GRANstable. This was generated automatically, and exports a defaultGRAN() function which the switchr package will use when the package is loaded to incorporate our package into the set of default repositories.

GRANBase represents (sub)repositories as GRANRepository objects, which come from the GRANCore package. These objects contain all the information required to build and deploy the repository.

Once a repository is created, its GRANRepository object is saved within the created directory structure as the repo.R file. This allows future builds to be invoked by the simpler syntax of passing a GRANRepository object or path to a created repository to makeRepo() directly:

repo <- makeRepo(file.path(repdir, "stable"), cores=1L)

The makeRepo() function also accepts a build_pkgs argument, which will cause only the specified packages (and their reverse dependencies) to be rebuilt, regardless of changes in version number.

repo2 <- makeRepo(repo,
                  build_pkgs = basename(testpkgs)[1],
                  cores = 1L)

The repository build process

GRANBase performs the following steps when creating or updating a repository. At the end of each step, the packages’ statuses are updated to reflect the results of that step.

Clearing temporary artifacts

In order to get a clean slate we can clear the repository, which will empty it completely (optionally excluding the code checkouts) via clear\_repo or we can clear just temporary files without wiping the recorded results with clear\_temp\_files.

repo2 <- clear_temp_files(repo, checkout = FALSE, logs = FALSE)
repo2 <- clear_repo(repo2, checkout = TRUE)
repo_results(repo2)
#>         name status version lastAttempt lastAttemptStatus
#> 1    deptest     ok   0.0-0        <NA>              <NA>
#> 2       toyp     ok   0.0-0        <NA>              <NA>
#> 3     toypkg     ok   0.0-0        <NA>              <NA>
#> 4   willfail     ok   0.0-0        <NA>              <NA>
#> 5 GRANstable     ok   0.0-0        <NA>              <NA>
#> 6    switchr     ok   0.0-0        <NA>              <NA>
#> 7   GRANBase     ok   0.0-0        <NA>              <NA>
#> 8   GRANCore     ok   0.0-0        <NA>              <NA>
#>   lastAttemptVersion lastbuilt lastbuiltversion lastbuiltstatus
#> 1               <NA>      <NA>             <NA>            <NA>
#> 2               <NA>      <NA>             <NA>            <NA>
#> 3               <NA>      <NA>             <NA>            <NA>
#> 4               <NA>      <NA>             <NA>            <NA>
#> 5               <NA>      <NA>             <NA>            <NA>
#> 6               <NA>      <NA>             <NA>            <NA>
#> 7               <NA>      <NA>             <NA>            <NA>
#> 8               <NA>      <NA>             <NA>            <NA>
#>                                     maintainer suspended building
#> 1 Who to complain to <[email protected]>     FALSE     TRUE
#> 2 Who to complain to <[email protected]>     FALSE     TRUE
#> 3 Who to complain to <[email protected]>     FALSE     TRUE
#> 4 Who to complain to <[email protected]>     FALSE     TRUE
#> 5     Gabriel Becker <[email protected]>     FALSE     TRUE
#> 6       Gabriel Becker <[email protected]>     FALSE     TRUE
#> 7     Gabriel Becker <[email protected]>     FALSE     TRUE
#> 8       Gabriel Becker <[email protected]>     FALSE     TRUE

We can then build the repo again using the expected machinery.

repo3 <- makeRepo(repo, install_test = FALSE, check_test = FALSE)
#> Started makeRepo at 2020-02-04 18:44:22
#> Checking for (and fixing) R version mismatch in packages installed to temp library 2020-02-04 18:44:22
#> Building 8 packages
#> Starting makeSrcDirs 2020-02-04 18:44:22
#> Building 8 packages
#> Starting buildBranchesInRepo 2020-02-04 18:44:26
#> Building 8 packages
#> Warning in read.dcf(file = tmpf): cannot open compressed file '/var/
#> folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/
#> tmprepo/src/contrib/PACKAGES', probable reason 'No such file or directory'
#> Warning in readLines(file): incomplete final line found on '/var/
#> folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/tmpcheckout/
#> switchr/././DESCRIPTION'
#> Invoking package tests 2020-02-04 18:44:29
#> Building 2 packages
#> * installing *source* package ‘willfail’ ...
#> ** R
#> ** data
#> ** byte-compile and prepare package for lazy loading
#> Error in eval(exprs[i], envir) : oops
#> Error : unable to load R code in package ‘willfail’
#> ERROR: lazy loading failed for package ‘willfail’
#> * removing ‘/private/var/folders/14/z0rjkn8j0n5dj1lkdd4ng1600000gn/T/RtmppqoEKc/repos/stable/LibLoc/willfail’
#> Warning in install.packages(pkgs = pkgs, repos = repos, lib = lib, ..., :
#> installation of package 'willfail' had non-zero exit status
#> starting migrateToFinalRepo 2020-02-04 18:44:32
#> Built 1 packages
#> Completed makeRepo at 2020-02-04 18:44:32

Tools for managing repository stability

GRANBase also provides tools to navigate the tension between stability and using the most up-to-date version of packages to have the latest bug fixes available.

The identifyRisk function identifies which currently installed packages can be updated, and determines the packages that could possibly be affected by updating the package. In particular, the function allows the user to identify a vector of important packages and assesses the risks to each of them (by default, it takes that to be the full set of installed packages).

Risk here has a dual meaning. On the one hand updating a package which an important package depends on incurs the risk of changing the important package’s behavior, potentially changing results in a critical application. On the other hand, not updating a such a package may leave important bug fixes un-applied, drawing the results generated when using the important package into question.

buildRiskReport builds an HTML report which lists information about each package with an update available in an easy to digest table. It also provides a list of specific risks to each important package (packages with no risks identified are currently omitted).

An update risk report generated by buildRiskReport()

An update risk report generated by buildRiskReport()