Many studies include data from assays which have not been integrated into the DataSpace. Some of these are available as “Non-Integrated Datasets,” which can be downloaded from the app as a zip file. DataSpaceR
provides an interface for accessing non-integrated data from studies where it is available.
Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example.
library(DataSpaceR)
<- connectDS()
con <- con$getStudy("vtn505")
vtn505
vtn505#> <DataSpaceStudy>
#> Study: vtn505
#> URL: https://dataspace.cavd.org/CAVD/vtn505
#> Available datasets:
#> - Binding Ab multiplex assay
#> - Demographics
#> - Intracellular Cytokine Staining
#> - Neutralizing antibody
#> Available non-integrated datasets:
#> - ADCP
#> - Demographics (Supplemental)
#> - Fc Array
The print method on the study object will list available non-integrated datasets. The availableDatasets
property shows some more info about available datasets, with the integrated
field indicating whether the data is integrated. The value for n
will be NA
for non-integrated data until the dataset has been loaded.
::kable(vtn505$availableDatasets) knitr
name | label | n | integrated |
---|---|---|---|
BAMA | Binding Ab multiplex assay | 10260 | TRUE |
Demographics | Demographics | 2504 | TRUE |
ICS | Intracellular Cytokine Staining | 22684 | TRUE |
NAb | Neutralizing antibody | 628 | TRUE |
ADCP | ADCP | NA | FALSE |
DEM SUPP | Demographics (Supplemental) | NA | FALSE |
Fc Array | Fc Array | NA | FALSE |
Non-Integrated datasets can be loaded with getDataset
like integrated data. This will unzip the non-integrated data to a temp directory and load it into the environment.
<- vtn505$getDataset("ADCP")
adcp #> downloading vtn505_adcp.zip to /var/folders/d9/q394nzdj7b7_5hqkp1bvh9v40000gn/T//Rtmp4FHYQc...
#> No encoding supplied: defaulting to UTF-8.
#> unzipping vtn505_adcp.zip to /var/folders/d9/q394nzdj7b7_5hqkp1bvh9v40000gn/T//Rtmp4FHYQc/vtn505_adcp
dim(adcp)
#> [1] 378 11
colnames(adcp)
#> [1] "study_prot" "participant_id" "study_day" "lab_code" "specimen_type"
#> [6] "antigen" "percent_cv" "avg_phagocytosis_score" "positivity_threshold" "response"
#> [11] "assay_identifier"
You can also view the file format info using getDatasetDescription
. For non-integrated data, this will open a pdf into your computer’s default pdf viewer.
$getDatasetDescription("ADCP") vtn505
Non-integrated data is downloaded to a temp directory by default. There are a couple of ways to override this if desired. One is to specify outputDir
when calling getDataset
or getDatasetDescription
.
If you will be accessing the data at another time and don’t want to have to re-download it, you can change the default directory for the whole study object with setDataDir
.
$dataDir
vtn505#> [1] "/var/folders/d9/q394nzdj7b7_5hqkp1bvh9v40000gn/T//Rtmp4FHYQc"
$setDataDir(".")
vtn505$dataDir
vtn505#> [1] "/Users/juyeongkim/git/CDS/DataSpaceR/vignettes"
If the dataset already exists in the specified dataDir
or outputDir
, it will be not be downloaded. This can be overridden with reload=TRUE
, which forces a re-download.
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 11.5.2
#>
#> Matrix products: default
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] DataSpaceR_0.7.5 knitr_1.33
#>
#> loaded via a namespace (and not attached):
#> [1] httr_1.4.2 pkgload_1.2.1 jsonlite_1.7.2 spelling_2.2 assertthat_0.2.1 askpass_1.1 highr_0.9
#> [8] triebeard_0.3.0 urltools_1.7.3 yaml_2.2.1 remotes_2.4.0 sessioninfo_1.1.1 pillar_1.6.2 glue_1.4.2
#> [15] uuid_0.1-4 digest_0.6.27 htmltools_0.5.2 pkgconfig_2.0.3 devtools_2.4.2 rcmdcheck_1.3.3 httpcode_0.3.0
#> [22] purrr_0.3.4 gitcreds_0.1.1 codemetar_0.3.2 processx_3.5.2 tibble_3.1.4 openssl_1.4.4 usethis_2.0.1
#> [29] ellipsis_0.3.2 cachem_1.0.6 withr_2.4.2 credentials_1.3.1 lazyeval_0.2.2 cli_3.0.1 magrittr_2.0.1
#> [36] crayon_1.4.1 memoise_2.0.0 evaluate_0.14 ps_1.6.0 fs_1.5.0 fansi_0.5.0 xml2_1.3.2
#> [43] parsedate_1.2.1 pkgbuild_1.2.0 tools_4.1.1 gh_1.3.0 hunspell_3.0.1 data.table_1.14.0 prettyunits_1.1.1
#> [50] Rlabkey_2.8.0 lifecycle_1.0.0 gert_1.3.2 stringr_1.4.0 xopen_1.0.0 pingr_2.0.1 callr_3.7.0
#> [57] rex_1.2.0 compiler_4.1.1 covr_3.5.1 rhub_1.1.1 rlang_0.4.11 rstudioapi_0.13 sys_3.4
#> [64] rappdirs_0.3.3 rmarkdown_2.10 testthat_3.0.4 curl_4.3.2 rematch_1.0.1 rematch2_2.1.2 R6_2.5.1
#> [71] fastmap_1.1.0 utf8_1.2.2 commonmark_1.7 rprojroot_2.0.2 desc_1.3.0 stringi_1.7.4 crul_1.1.0
#> [78] whoami_1.3.0 Rcpp_1.0.7 vctrs_0.3.8 xfun_0.25