Learn R Programming

SPARRAfairness (version 0.0.0.1)

sim_pop_data: sim_pop_data

Description

Simulates population data with a reasonably realistic joint distribution

Usage

sim_pop_data(
  npop,
  coef_adjust = 4,
  offset = 1,
  vcor = NULL,
  coefs = c(2, 1, 0, 5, 3, 0, 0),
  seed = 12345,
  incl_id = TRUE,
  incl_reason = TRUE
)

Value

data frame with realistic values.

Arguments

npop

population size

coef_adjust

inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates.

offset

offset for logistic model (default 1): higher means a lower overall prevalence of admission

vcor

a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data.

coefs

coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust.

seed

random seed (default 12345)

incl_id

include an ID column (default TRUE)

incl_reason

include a column indicating reason for admission.

Details

Simulates data for a range of people for the variables

  • Age (age)

  • Sex (sexM; 1 if male)

  • Race/ethnicity (raceNW: 1 if non-white ethnicity)

  • Number of previous hospital admissions (PrevAdm)

  • Deprivation decile (SIMD: 1 most deprived, 10 least deprived. NOTE - opposite to English IMD)

  • Urban-rural residence status (urban_rural: 1 for rural)

  • Mainland-island residence status (mainland_island: 1 for island)

  • Hospital admission (target: 1/TRUE if admitted to hospital in year following prediction date)

Can optionally add an ID column.

Optionally includes an admission reason for samples with target=1. These admission reasons roughly correspond to the first letters of ICD10 categories, and can either correspond to an admission or death. Admission reasons are simulated with a non-constant multinomial distribution which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly- chosen way. The distributions of admission reasons are not however chosen to reflect real distributions, nor are systematic changes in commonality of admission types across categories intended to appear realistic.

Examples

Run this code

# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])

# See vignette

Run the code above in your browser using DataLab