REDCapExporter

Building R Data Packages from REDCap Projects

Peter DeWitt

The purpose of this vignette is to show how to export a REDCap project into a R data package. Please read

vignette(topic = "api", package = "REDCapExporter")

for details on using the REDCap API to export the contents of a REDCap project. This vignette will assume you are able to call export_core successfully. For the purpose of this vignette, the data set

data(avs_raw_core, package = "REDCapExporter")
str(avs_raw_core)
## List of 4
##  $ project_raw : 'rcer_raw_project' chr "project_id,project_title,creation_time,production_time,in_production,project_language,purpose,purpose_other,pro"| __truncated__
##   ..- attr(*, "Content-Type")= Named chr [1:2] "text/csv" "utf-8"
##   .. ..- attr(*, "names")= chr [1:2] "" "charset"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2020-02-25 15:11:16"
##  $ metadata_raw: 'rcer_raw_metadata' chr "field_name,form_name,section_header,field_type,field_label,select_choices_or_calculations,field_note,text_valid"| __truncated__
##   ..- attr(*, "Content-Type")= Named chr [1:2] "text/csv" "utf-8"
##   .. ..- attr(*, "names")= chr [1:2] "" "charset"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2020-02-25 15:11:17"
##  $ user_raw    : 'rcer_raw_user' chr "username,email,firstname,lastname,expiration,data_access_group,data_access_group_id,design,user_rights,data_acc"| __truncated__
##   ..- attr(*, "Content-Type")= Named chr [1:2] "text/csv" "utf-8"
##   .. ..- attr(*, "names")= chr [1:2] "" "charset"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2020-02-25 15:11:17"
##  $ record_raw  : 'rcer_raw_record' chr "record_id,uniform_number,firstname,lastname,hof,nationality,position,birthdate,height,weight,shoots,catches,exp"| __truncated__
##   ..- attr(*, "Content-Type")= Named chr [1:2] "text/csv" "utf-8"
##   .. ..- attr(*, "names")= chr [1:2] "" "charset"
##   ..- attr(*, "accessed")= POSIXct[1:1], format: "2020-02-25 15:11:17"
##  - attr(*, "class")= chr "rcer_rccore"

is the result of calling export_core to export elements of a REDCap project containing data on the 2000-2001 Stanley Cup Champion Colorado Avalanche.

1 Exporting a REDCap Project to a R Data Package

Exporting a REDCap project to a R data package is done with a call to build_r_data_package. If the reader passes the uri for the API and an API token a call to export_core will be made. If you prefer, or as in this case, need to have access to the core contents of the REDCap project available, then you may pass the rcer_rccore object to build_r_data_package to build the skeleton of a R data package.

To build the skeleton of a R data package you will need to pass in the core export from the REDCap project, a path for were the source code for the data package will be written, and some some information about the users. In this context, users are the persons who have, or had, access to the REDCap project and are listed under the UserRights section of the REDCap project. The user data from REDCap is used to construct the Author section of the DESCRIPTION file for the R data package to be constructed. By default, all users are listed as ‘contributors’. Modification of the roles can be provide by a named list object. In the example below, the user dewittp is going to assigned the creator and author role. To be a valid R package, at least one user will need to have the creator role assigned.

temppath <- tempdir()
build_r_data_package(x = avs_raw_core,
                     path = temppath,
                     author_roles = list(dewittp = c("cre", "aut")))
## Creating source package at /var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T//Rtmpb2t7Vz/rcd14465
## Updating rcd14465 documentation
## First time using roxygen2. Upgrading automatically...
## Updating roxygen version in /private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz/rcd14465/DESCRIPTION
## Loading rcd14465
## Writing NAMESPACE
## Writing project.Rd
## Writing metadata.Rd
## Writing user.Rd
## Writing record.Rd

The resulting directory is:

fs::dir_tree(temppath)
## /var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T//Rtmpb2t7Vz
## ├── callr-env-39d97edbae9b
## └── rcd14465
##     ├── DESCRIPTION
##     ├── LICENSE
##     ├── NAMESPACE
##     ├── R
##     │   └── datasets.R
##     ├── data
##     │   ├── metadata.rda
##     │   ├── project.rda
##     │   ├── record.rda
##     │   └── user.rda
##     ├── inst
##     │   └── raw-data
##     │       ├── metadata.rds
##     │       ├── project.rds
##     │       ├── record.rds
##     │       └── user.rds
##     └── man
##         ├── metadata.Rd
##         ├── project.Rd
##         ├── record.Rd
##         └── user.Rd

1.1 Details on exported Files

First, the package directory name. Exported packages from REDCapExporter will have the directory name rcd. This name is also used in the DESCRIPTION file.

The DESCRIPTION file is

prj_dir <- list.dirs(temppath)
prj_dir <- prj_dir[grepl("/rcd\\d+$", prj_dir)]
t(read.dcf(paste(prj_dir, "DESCRIPTION", sep = "/")))
##                 [,1]                                                                                                                                                                                                                               
## Package         "rcd14465"                                                                                                                                                                                                                         
## Title           "2000-2001 Colorado Avalanche"                                                                                                                                                                                                     
## Version         "2020.05.15.10.46"                                                                                                                                                                                                                 
## [email protected]       "c(person(given = \"Tell\", family = \"Bennett\", email = \"[email protected]\", role = c(\"ctb\")),\nperson(given = \"Peter\", family = \"DeWitt\", email = \"[email protected]\", role = c(\"cre\", \"aut\")))"
## Description     "Data and documentation from the REDCap Project."                                                                                                                                                                                  
## License         "file LICENSE"                                                                                                                                                                                                                     
## Encoding        "UTF-8"                                                                                                                                                                                                                            
## LazyData        "true"                                                                                                                                                                                                                             
## Suggests        "knitr"                                                                                                                                                                                                                            
## VignetteBuilder "knitr"                                                                                                                                                                                                                            
## RoxygenNote     "7.1.0"

The title comes from the project info recorded in REDCap. The version number is set as the year.month.day.hour.minute of the export. As noted above, the Author field is built from the user data stored in REDCap.

The LICENSE file notes that the package is proprietary and should not be installed or distributed to others who are not authorized to have access to the data.

cat(readLines(paste(prj_dir[1], "LICENSE", sep = "/")), sep = "\n")
## Proprietary
## 
## 
##       Do not distribute to anyone or to machines which are not authorized to hold the data.

The raw data exports are stored as .rds files under inst/raw-data so that these files will be available in R sessions after installing the package.

The data directory has data.table versions of the data sets.

The R/datasets.R file provides the documentation for the data sets which can be accessed in an interactive R session.

1.2 Using the Exported Package

Let’s install the package and explore the contents.

tar_ball <- devtools::build(pkg = prj_dir)
##   
   checking for file ‘/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz/rcd14465/DESCRIPTION’ ...
  
✔  checking for file ‘/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz/rcd14465/DESCRIPTION’
## 
  
─  preparing ‘rcd14465’:
## 
  
   checking DESCRIPTION meta-information ...
  
✔  checking DESCRIPTION meta-information
## 
  
─  checking for LF line-endings in source and make files and shell scripts
## 
  
─  checking for empty or unneeded directories
## 
  
     NB: this package now depends on R (>= 3.5.0)
## 
  
     WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: ‘rcd14465/data/metadata.rda’  ‘rcd14465/data/project.rda’  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: ‘rcd14465/data/record.rda’  ‘rcd14465/data/user.rda’  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects:  ‘rcd14465/inst/raw-data/metadata.rds’  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: ‘rcd14465/inst/raw-data/project.rds’  WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: ‘rcd14465/inst/raw-data/record.rds’  ‘rcd14465/inst/raw-data/user.rds’
## 
  
─  building ‘rcd14465_2020.05.15.10.46.tar.gz’
## 
  
   
## 
tar_ball
## [1] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz/rcd14465_2020.05.15.10.46.tar.gz"

install.packages(pkgs = tar_ball, lib = temppath)
## inferring 'repos = NULL' from 'pkgs'
library(rcd14465, lib.loc = temppath)

The available data sets:

data(package = "rcd14465")$results
##      Package   
## [1,] "rcd14465"
## [2,] "rcd14465"
## [3,] "rcd14465"
## [4,] "rcd14465"
##      LibPath                                                              
## [1,] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz"
## [2,] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz"
## [3,] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz"
## [4,] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/Rtmpb2t7Vz"
##      Item       Title     
## [1,] "metadata" "Metadata"
## [2,] "project"  "Project" 
## [3,] "record"   "Record"  
## [4,] "user"     "User"

A simple data analysis question: how many goals were scored by position?

library(data.table)
as.data.table(record)[, sum(goals), by = position]
##      position V1
## 1:       Goal  0
## 2:     Center 93
## 3:    Defence 34
## 4: Right Wing 72
## 5:  Left Wing 71

2 Session Info

print(sessionInfo(), local = FALSE)
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.4
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table_1.12.9         rcd14465_2020.05.15.10.46
## [3] REDCapExporter_0.2.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6      compiler_4.0.0    remotes_2.1.1     prettyunits_1.1.1
##  [5] tools_4.0.0       testthat_2.3.2    digest_0.6.25     pkgbuild_1.0.8   
##  [9] pkgload_1.0.2     lubridate_1.7.8   evaluate_0.14     memoise_1.1.0    
## [13] rlang_0.4.6       rstudioapi_0.11   cli_2.0.2         yaml_2.2.1       
## [17] xfun_0.13         xml2_1.3.2        roxygen2_7.1.0    qwraps2_0.4.2    
## [21] withr_2.2.0       stringr_1.4.0     knitr_1.28        fs_1.4.1         
## [25] generics_0.0.2    desc_1.2.0        devtools_2.3.0    rprojroot_1.3-2  
## [29] glue_1.4.1        R6_2.4.1          processx_3.4.2    fansi_0.4.1      
## [33] rmarkdown_2.1     sessioninfo_1.1.1 purrr_0.3.4       callr_3.4.3      
## [37] magrittr_1.5      usethis_1.6.1     backports_1.1.7   ps_1.3.3         
## [41] ellipsis_0.3.1    htmltools_0.4.0   assertthat_0.2.1  stringi_1.4.6    
## [45] crayon_1.3.4