repoRter.nih is an R package that allows users to request project award data from the U.S. National Institute of Health (NIH)’s Application Programming Interface (API). The RePORTER API provides the public with access to project and grantee information for all projects funded through NIH grants (around $42 billion annually as of 2021). This package provides methods enabling a user to easily build requests in the non-standard JSON schema required by the RePORTER API as well as to retrieve and process results.
repoRter.nih can be installed easily through CRAN or GitHub.
The latest stable release (consistent with CRAN version) can be installed with
A development version of this package will also be available and may contain bugfixes, updates to match upstream RePORTER API changes (for example, changes to request schema), and/or new functionality not yet pushed to CRAN. For the latest dev version, you must specify to install from the ‘dev’ branch:
The repoRter.nih package supports v2 of the RePORTER API. This API does not require registration or authorization of any kind. There are some hard and soft limits around request rates & page and complete result set size to be aware of, detailed below:
|Daily query limit||None||NIH requests you limit large jobs to US weekends and weekdays between 9PM and 5AM EST|
|Request rate limit||None||NIH asks you to not post more than 1 request per second - repoRter.nih enforces this|
|Records per page (max)||500||Default in get_nih_data()|
||9999||Effectively limits you to retrieving only the first 10,000 records from any result|
Note that it is possible to work around the
parameter limitation by breaking up large requests into smaller
requests. If you do this with some thoughtfulness and a little
programming, you can work around to obtain complete large result sets -
an example is provided in the vignette.
If no criteria are provided, the default request is for all projects
fiscal_year = lubridate::year(Sys.Date()). This search
will often return over 10,000 results so the full result set will not be
retrievable without some programming.
<- make_req() req <- get_nih_data(req)res
Different fields are available for you to search and filter projects.
<- make_req(criteria = req list(covid_response = c("All")), message = FALSE) <- get_nih_data(req, res flatten_result = TRUE)
flatten_result argument will
data.frames and atomic vectors (in which
case, values are pasted together and delimited by “;”).
library(ggplot2) %>% res left_join(covid_response_codes, by = "covid_response") %>% mutate(covid_code_desc = case_when(!is.na(fund_src) ~ paste0(covid_response, ": ", fund_src), TRUE ~ paste0(covid_response, " (Multiple)"))) %>% group_by(covid_code_desc) %>% summarise(total_awards = sum(award_amount) / 1e6) %>% ungroup() %>% arrange(desc(covid_code_desc)) %>% mutate(prop = total_awards / sum(total_awards), csum = cumsum(prop), ypos = csum - prop/2 ) %>% ggplot(aes(x = "", y = prop, fill = covid_code_desc)) + geom_bar(stat="identity") + geom_text_repel(aes(label = paste0(dollar(total_awards, accuracy = 1, suffix = "M"), "\n", percent(prop, accuracy = .01)), y = ypos), show.legend = FALSE, nudge_x = .8, size = 3, color = "grey25") + coord_polar(theta ="y") + theme_void() + theme(legend.position = "right", legend.title = element_text(colour = "grey25"), legend.text = element_text(colour="blue", size=6, face="bold"), plot.title = element_text(color = "grey25"), plot.caption = element_text(size = 6)) + labs(caption = "Data Source: NIH RePORTER API v2") + ggtitle("Legislative Source for NIH Covid Response Project Funding")
With the basics described above you can get started with the BLS API right away. To learn more see:
If you’ve exhausted the resources above and require help, please open an issue in this repository. Include the problematic code, a description of the expected behavior, the unintended behavior actually observed, and your investigations so far.
Feel free also to email me with feedback or to notify me of new issues.