This vignette demonstrates how to use the rfars
package
to download FARS data and use it to compare crash characteristics and
trends in rural and urban areas. First we download the data, then make a
convenient adjustment to the given coding scheme. Rural and urban
crashes and fatalities are then compared on many different
dimensions.
Below we get two years of data for one state, Virginia.
Here we make an adjustment to the standard variable definitions to
label all people driving motorcycles as motorcyclists rather
than drivers of motorcycles. This makes it easier to refer to
motorcyclists later. Note that the myFARS
object is a list,
with the flat
tibble containing the required variables:
body_typ
(vehicle body type) and per_typ
(person type).
myFARS$flat$per_typ <-
ifelse(grepl("motorcycle",
myFARS$flat$body_typ,
ignore.case = TRUE),
"Motorcyclist",
myFARS$flat$per_typ)
The counts()
function makes it easy to generate specific
types of counts. compare_counts()
lets you quickly run two
sets of counts for comparison purposes.
The number of crashes is a reasonable starting point. Below we use
our compare_counts()
function and ggplot()
to
plot the annual count of crashes and fatalities in rural and urban
areas, from 2015 to 2020.
compare_counts(df = myFARS, what = "crashes", where = list(states="VA", urb="rural"), where2 = list(states="VA", urb = "urban")) %>%
ggplot(aes(x=factor(year), y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1) +
facet_wrap(.~urb) +
labs(x=NULL, y=NULL, title = "Crashes", fill=NULL)
compare_counts(df = myFARS, what = "fatalities", where = list(urb="rural"), where2 = list(urb="urban")) %>%
ggplot(aes(x=factor(year), y=n, label=scales::comma(n))) +
geom_col() +
geom_label(vjust=1) +
facet_wrap(.~urb) +
labs(x=NULL, y=NULL, title = "Fatalities", fill=NULL)
involved
Argumentcounts()
makes it easy to hone in on specific crash
types by using the involved
argument. It can be any of:
distracted driver, drowsy driver, police pursuit, motorcycle,
pedalcyclist, bicyclist, pedestrian, pedbike, young driver, older
driver, speeding, alcohol, drugs, hit and run, roadway departure,
rollover, or large trucks. Specifying involved
will filter
the counts to those matching the criterion. For example
involved="distracted driver"
will return counts associated
with crashes involving a distracted driver. Multiple values can be
supplied; if so, the resulting counts will satisfy all
criteria. That is, they are combined with the and operator (as
opposed to or).
Below we loop through all options available in rfars
and
generate simple plots.
crashfactors <- c("distracted driver", "drowsy driver",
"police pursuit", "motorcycle", "pedalcyclist",
"bicyclist", "pedestrian", "pedbike",
"young driver", "older driver", "speeding",
"alcohol", "drugs", "hit and run",
"roadway departure", "rollover", "large trucks"
)
for(crashfactor in crashfactors){
p <-
compare_counts(df = myFARS, what = "fatalities", where = list(urb="rural"), where2 = list(urb="urban"), involved = crashfactor) %>%
ggplot(aes(x=factor(year), y=n, label=scales::comma(n))) +
geom_col(position="dodge") +
facet_wrap(.~urb) +
geom_label(position = position_dodge(.9), vjust=1) +
labs(title = paste0("Fatalities: ", crashfactor))
print(p)
}
filterOnly
OptionThe counts()
function has a filterOnly
option, which returns pre-summarized data fitting the other
specifications (what, where, etc.). This can be useful when generating
custom counts. For example, acc_type
(Crash Type) is a
vehicle-level variable. To count the number of crashes by
acc_type
, we need to prevent over-counting (as there will
be one value for acc_type
for each vehicle involved in each
crash). Below we take the value associated with veh_no
1
(vehicle number 1). This is reasonable, but may not be appropriate for
all analysis situations.
bind_rows(
counts(myFARS,
what = "crashes",
where = list(urb="rural"),
filterOnly = TRUE
) %>%
filter(veh_no==1) %>% #crash type is on the vehicle-level, this prevents over-counting
select(id, year, acc_type) %>% unique() %>% group_by(acc_type, year) %>% summarize(n=n()) %>%
mutate(where = "Rural"),
counts(myFARS,
what = "crashes",
where = list(urb="urban"),
filterOnly = TRUE
) %>%
filter(veh_no==1) %>%
select(id, year, acc_type) %>% unique() %>% group_by(acc_type, year) %>% summarize(n=n()) %>%
mutate(where = "Urban")
) %>%
filter(!is.na(acc_type)) %>%
group_by(where, acc_type) %>% summarize(n=sum(n, na.rm=TRUE)) %>%
tidyr::pivot_wider(names_from = "where", values_from = "n") %>%
mutate(Total = Urban + Rural,
rural_pct = Rural/Total) %>%
arrange(desc(Total)) %>%
slice(1:20) %>%
arrange(desc(rural_pct)) %>%
mutate(acc_type = reorder(acc_type, rural_pct)) %>%
ggplot(aes(y=acc_type, x=rural_pct, fill=Rural, label=scales::percent(rural_pct, accuracy = 1))) +
geom_col() +
geom_label(hjust=1, fill="white") +
scale_fill_continuous(labels=scales::comma) +
labs(x=NULL, y=NULL,
title = "20 Most Common Crash Types by Prevalence in Rural Areas") +
theme(plot.title.position = "plot")
#> Adding missing grouping variables: `date`
#> `summarise()` has grouped output by 'acc_type'. You can override using the
#> `.groups` argument.
#> Adding missing grouping variables: `date`
#> `summarise()` has grouped output by 'acc_type'. You can override using the
#> `.groups` argument.
#> `summarise()` has grouped output by 'where'. You can override using the
#> `.groups` argument.
flat
TibbleIt is sometimes easiest to access the data directly, rather than with
the counts()
function. The object created by
get_fars()
is a list with six tibbles: flat
,
multi_acc
, multi_veh
, multi_per
,
events
, and codebook
. The flat
tibble contains over 200 variables, and can often provide what’s
needed.
Below are several examples:
myFARS$flat %>%
mutate(
vprofile = ifelse(vprofile %in% c("Uphill", "Downhill"), "Up/downhill", vprofile),
valign = ifelse(grepl("Curve", valign), "Curve", valign)
) %>%
filter(veh_no == 1, #to avoid over-counting
rur_urb %in% c("Rural", "Urban"),
valign %in% c("Straight", "Curve"),
!(vprofile %in% c("Unknown", "Reported as Unknown", "Not Reported"))
) %>%
select(id, vprofile, valign, rur_urb) %>% unique() %>%
group_by(vprofile, valign, rur_urb) %>%
summarize(n = n()) %>%
ggplot(aes(x=valign, y=vprofile, fill=n, label=scales::comma(n))) +
#geom_tile() +
facet_wrap(.~rur_urb) +
viridis::scale_fill_viridis(alpha=.5) +
geom_label() +
labs(title = "Roadway Profile and Alignment")
#> `summarise()` has grouped output by 'vprofile', 'valign'. You can override
#> using the `.groups` argument.
myFARS$flat %>%
filter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, per_typ) %>%
summarise(n=n()) %>%
filter(n>2) %>%
#mutate(per_typ = stringr::str_wrap(per_typ, 15)) %>%
ggplot(aes(y=per_typ, x=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
#geom_label(hjust=-1, position = position_dodge(.9)) +
labs(title = "Fatalities by Person Type and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
myFARS$flat %>%
filter(rur_urb %in% c("Rural", "Urban"), sex %in% c("Male", "Female")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, sex) %>%
summarise(n=n()) %>%
filter(n>90) %>%
mutate(sex = stringr::str_wrap(sex, 15)) %>%
ggplot(aes(x=sex, y=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
geom_label(vjust=1, position = position_dodge(.9)) +
labs(title = "Fatalities by Sex and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
myFARS$flat %>%
filter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, hispanic) %>%
summarise(n=n()) %>%
filter(n>10) %>%
mutate(hispanic = stringr::str_wrap(hispanic, 15)) %>%
ggplot(aes(y=hispanic, x=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
geom_label(hjust=-1, position = position_dodge(.9)) +
labs(title = "Fatalities by Ethnicity and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.
myFARS$flat %>%
mutate(
age_n = gsub("\\D+","", age) %>% as.numeric(),
hour2 = stringr::word(hour, 1, 1, sep = ":") %>% as.numeric(),
hourm = stringr::str_sub(hour, -2, -1),
hour = ifelse(hourm=="am", hour2, hour2+12),
hour = ifelse(hour==24, 12, hour)
) %>%
filter(grepl("(K)", inj_sev),
rur_urb %in% c("Rural", "Urban"),
hour < 24,
age_n <= 90) %>%
group_by(rur_urb, age_n, hour) %>% summarize(n=n()) %>%
ggplot(aes(x=hour, y=age_n, fill=n)) +
geom_tile() +
facet_wrap(.~rur_urb) +
viridis::scale_fill_viridis() +
labs(title = "Fatalities by Age, Time of Day, and Urbanicity")
#> `summarise()` has grouped output by 'rur_urb', 'age_n'. You can override using
#> the `.groups` argument.
multi_per
TibbleIf the flat
tibble does not have the required
information, it may be in one of the multi_
tibbles. Below,
we access the multi_per
tibble to visualize fatalities by
race.
myFARS$multi_per %>%
filter(name == "mtm_crsh") %>%
select(state, st_case, veh_no, per_no, year, mtm_crsh=value) %>%
mutate_at(c("st_case", "veh_no", "per_no", "year"), as.numeric) %>%
inner_join(myFARS$flat) %>%
filter(rur_urb %in% c("Rural", "Urban")) %>%
filter(grepl("(K)", inj_sev)) %>%
group_by(rur_urb, mtm_crsh) %>%
summarise(n=n()) %>%
filter(n>9) %>%
mutate(mtm_crsh = stringr::str_wrap(mtm_crsh, 25)) %>%
ggplot(aes(y=mtm_crsh, x=n, fill=rur_urb, label = scales::comma(n))) +
geom_col(position = "dodge") +
#geom_label(vjust=1, position = position_dodge(.9)) +
labs(title = "Non-Motorist Contributing Circumstances")
#> Joining, by = c("state", "st_case", "veh_no", "per_no", "year")
#> `summarise()` has grouped output by 'rur_urb'. You can override using the
#> `.groups` argument.