rfars

The goal of rfars is to facilitate transportation safety analysis by simplifying the process of extracting data from official crash databases. The National Highway Traffic Safety Administration collects and publishes a census of fatal crashes in the Fatality Analysis Reporting System and a sample of fatal and non-fatal crashes in the Crash Report Sampling System (an evolution of the General Estimates System). The Fatality and Injury Reporting System Tool allows users to query these databases, and can produce simple tables and graphs. This suffices for simple analysis, but often leaves researchers wanting more. Digging any deeper, however, involves a time-consuming process of downloading annual ZIP files and attempting to stitch them together - after first combing through immense data dictionaries to determine the required variables and table names.

rfars allows users to download the last 10 years of FARS and GES/CRSS data with just one line of code. The result is a full, rich dataset ready for mapping, modeling, and other downstream analysis. Codebooks with variable definitions and value labels support an informed analysis of the data (see vignette("Searchable Codebooks", package = "rfars") for more information). Helper functions are also provided to produce common counts and comparisons.

Installation

You can install the latest version of rfars from GitHub with:

# install.packages("devtools")
devtools::install_github("s87jackson/rfars")

or the CRAN stable release with:

install.packages("rfars")

Then load rfars and some helpful packages:

library(rfars)
library(dplyr)

Getting and Using Data

The get_fars() and get_gescrss() are the primary functions of the rfars package. These functions download and process data files directly from NHTSA’s FTP Site, or pull the prepared data stored on your local machine, or (as of Version 2.0) pull the prepared data from Zenodo. The data files hosted on Zenodo are stable, have DOIs, and replicate the data that would be produced by get_fars() and get_gescrss(), but in a fraction of the time.

They take the parameters years and states (FARS) or regions (GES/CRSS). As the source data files follow an annual structure, years determines how many file sets are downloaded or loaded, and states/regions filters the resulting dataset. Downloading and processing these files can take several minutes. Before downloading, rfars will inform you that it’s about to download files and asks your permission to do so. To skip this dialog, set proceed = TRUE. You can use the dir and cache parameters to save an RDS file to your local machine. The dir parameter specifies the directory, and cache names the file (be sure to include the .rds file extension).

Executing the code below will download the prepared FARS and GES/CRSS databases for 2014-2023.

myFARS <- get_fars(proceed = TRUE)
myCRSS <- get_gescrss(proceed = TRUE)

get_fars() and get_gescrss() return a list with six dataframes: flat, multi_acc, multi_veh, multi_per, events, and codebook.

The tables below show records for randomly selected crashes to illustrate the content and structure of the data. The tables are transposed for readability.

Each row in the flat dataframe corresponds to a person involved in a crash. As there may be multiple people and/or vehicles involved in one crash, some variable-values are repeated within a crash or vehicle. Each crash is uniquely identified with id, which is a combination of year and st_case. Note that st_case is not unique across years, for example, st_case 510001 will appear in each year. The id variable attempts to avoid this issue. The GES/CRSS data includes a weight variable that indicates how many crashes each row represents.

rfars

Installation

Getting and Using Data

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Homepage

Maintainer

Last Published

Functions in rfars (2.0.2)