rfars
The goal of rfars is to facilitate transportation safety analysis by
simplifying the process of extracting data from official crash
databases. The National Highway Traffic Safety
Administration collects and publishes a census
of fatal crashes in the Fatality Analysis Reporting
System
and a sample of fatal and non-fatal crashes in the Crash Report
Sampling
System
(an evolution of the General Estimates
System).
The Fatality and Injury Reporting System
Tool allows users to query these databases,
and can produce simple tables and graphs. This suffices for simple
analysis, but often leaves researchers wanting more. Digging any deeper,
however, involves a time-consuming process of downloading annual ZIP
files and attempting to stitch them together - after first combing
through immense data dictionaries to determine the required variables
and table names.
rfars allows users to download the last 10 years of FARS and GES/CRSS
data with just one line of code. The result is a full, rich dataset
ready for mapping, modeling, and other downstream analysis. Codebooks
with variable definitions and value labels support an informed analysis
of the data (see vignette("Searchable Codebooks", package = "rfars")
for more information). Helper functions are also provided to produce
common counts and comparisons.
Installation
You can install the latest version of rfars from
GitHub with:
# install.packages("devtools")
devtools::install_github("s87jackson/rfars")or the CRAN stable release with:
install.packages("rfars")Then load rfars and some helpful packages:
library(rfars)
library(dplyr)Getting and Using Data
The get_fars() and get_gescrss() are the primary functions of the
rfars package. These functions download and process data files
directly from NHTSA’s FTP
Site, or pull
the prepared data stored on your local machine, or (as of Version 2.0)
pull the prepared data from Zenodo. The data files hosted on Zenodo are
stable, have DOIs, and replicate the data that would be produced by
get_fars() and get_gescrss(), but in a fraction of the time.
They take the parameters years and states (FARS) or regions
(GES/CRSS). As the source data files follow an annual structure, years
determines how many file sets are downloaded or loaded, and
states/regions filters the resulting dataset. Downloading and
processing these files can take several minutes. Before downloading,
rfars will inform you that it’s about to download files and asks your
permission to do so. To skip this dialog, set proceed = TRUE. You can
use the dir and cache parameters to save an RDS file to your local
machine. The dir parameter specifies the directory, and cache names
the file (be sure to include the .rds file extension).
Executing the code below will download the prepared FARS and GES/CRSS databases for 2014-2023.
myFARS <- get_fars(proceed = TRUE)
myCRSS <- get_gescrss(proceed = TRUE)get_fars() and get_gescrss() return a list with six dataframes:
flat, multi_acc, multi_veh, multi_per, events, and codebook.
The tables below show records for randomly selected crashes to illustrate the content and structure of the data. The tables are transposed for readability.
Each row in the flat dataframe corresponds to a person involved in a
crash. As there may be multiple people and/or vehicles involved in one
crash, some variable-values are repeated within a crash or vehicle. Each
crash is uniquely identified with id, which is a combination of year
and st_case. Note that st_case is not unique across years, for
example, st_case 510001 will appear in each year. The id variable
attempts to avoid this issue. The GES/CRSS data includes a weight
variable that indicates how many crashes each row represents.