Learn R Programming

fairness (version 1.2.0)

compas: Modified COMPAS dataset

Description

compas is a landmark dataset to study algorithmic (un)fairness. This data was used to predict recidivism (whether a criminal will reoffend or not) in the USA. The tool was meant to overcome human biases and offer an algorithmic, fair solution to predict recidivism in a diverse population. However, the algorithm ended up propagating existing social biases and thus, offered an unfair algorithmic solution to the problem. In this dataset, a model to predict recidivism has already been fit and predicted probabilities and predicted status (yes/no) for recidivism have been concatenated to the original data.

Usage

compas

Arguments

Format

A data frame with 6172 rows and 9 variables:

Two_yr_Recidivism

factor, yes/no for recidivism or no recidivism. This is the outcome or target in this dataset

Number_of_Priors

numeric, number of priors, normalized to mean = 0 and standard deviation = 1

Age_Above_FourtyFive

factor, yes/no for age above 45 years or not

Age_Below_TwentyFive

factor, yes/no for age below 25 years or not

Female

factor, female/male for gender

Misdemeanor

factor, yes/no for having recorded misdemeanor(s) or not

ethnicity

factor, Caucasian, African American, Asian, Hispanic, Native American or Other

probability

numeric, predicted probabilities for recidivism, ranges from 0 to 1

predicted

numeric, predicted values for recidivism, 0/1 for no/yes