Simulates population data with a reasonably realistic joint distribution
sim_pop_data(
npop,
coef_adjust = 4,
offset = 1,
vcor = NULL,
coefs = c(2, 1, 0, 5, 3, 0, 0),
seed = 12345,
incl_id = TRUE,
incl_reason = TRUE
)
data frame with realistic values.
population size
inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates.
offset for logistic model (default 1): higher means a lower overall prevalence of admission
a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data.
coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust.
random seed (default 12345)
include an ID column (default TRUE)
include a column indicating reason for admission.
Simulates data for a range of people for the variables
Age (age
)
Sex (sexM
; 1 if male)
Race/ethnicity (raceNW
: 1 if non-white ethnicity)
Number of previous hospital admissions (PrevAdm
)
Deprivation decile (SIMD
: 1 most deprived, 10 least deprived. NOTE - opposite to English IMD)
Urban-rural residence status (urban_rural
: 1 for rural)
Mainland-island residence status (mainland_island
: 1 for island)
Hospital admission (target
: 1/TRUE if admitted to hospital in year following prediction date)
Can optionally add an ID column.
Optionally includes an admission reason for samples with target=1
. These admission reasons
roughly correspond to the first letters of ICD10 categories, and can either correspond to an
admission or death. Admission reasons are simulated with a non-constant multinomial distribution
which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly-
chosen way. The distributions of admission reasons are not however chosen to reflect real
distributions, nor are systematic changes in commonality of admission types across categories
intended to appear realistic.
# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])
# See vignette
Run the code above in your browser using DataLab