prevalence: Estimate point prevalence at an index date.

Description

Point prevalence at a specific index date is estimated using contributions to prevalence from both available registry data, and from Monte Carlo simulations of the incidence and survival process, as outlined by Crouch et al (2004) (see References).

Usage

prevalence(form, data, num_years_to_estimate, index_date = NULL,
  num_reg_years = NULL, cure = 10, N_boot = 1000,
  population_size = NULL, proportion = 1e+05, level = 0.95,
  population_data = NULL, precision = 2, n_cores = 1, start = NULL)

Arguments

form

Formula where the LHS is represented by a standard Surv object, and the RHS has three special function arguments: age, the column where age is located; sex, the column where sex is located; entry, the column where dates of entry to the registry are located; and event, the column where event dates are located.

This formula is used in the following way:

Surv(time, status) ~ age(age_column_name) + sex(sex_column_name) + entry(entry_column_name) + event(event_column_name)

Using the supplied prevsim dataset, it is therefore called with:

Surv(time, status) ~ age(age) + sex(sex) + entry(entrydate) + event(eventdate)

data

A data frame with the corresponding column names provided in form.

num_years_to_estimate

Number of years of data to consider when estimating point prevalence; multiple values can be specified in a vector. If any values are greater than the number of years of registry data available before index_date, incident cases for the difference will be simulated.

index_date

The date at which to estimate point prevalence. Defaults to the latest registry entry date.

num_reg_years

The number of years of the registry for which incidence is to be calculated. Defaults to using all available complete years. Note that if more registry years are supplied than the number of years to estimate prevalence for, the survival data from the surplus registry years are still involved in the survival model fitting.

cure

Integer defining cure model assumption for the calculation (in years). A patient who has survived beyond the cure time has a probability of surviving derived from the mortality rate of the general population.

N_boot

Number of bootstrapped calculations to perform.

population_size

Integer corresponding to the size of the population at risk.

proportion

The population ratio to estimate prevalence for.

level

Double representing the desired confidence interval width.

population_data

A dataframe that must contain the columns age, rate, and sex, where each row is the mortality rate for a person of that age and sex. Ideally, age ranges from [0, 100]. Defaults to the supplied data; see UKmortality for the format required for custom datasets.

precision

Integer representing the number of decimal places required.

n_cores

Number of CPU cores to run the fitting of the bootstrapped survival models. Defaults to 1; multi-core functionality is provided by the doParallel package.

start

Deprecated: Use index_date instead and specify the number of years of registry data to use with num_reg_years. Date from which incident cases are included in the format YYYY-MM-DD. Defaults to the earliest entry date. This value is now inferred by counting back num_reg_years years of registry data from the index_date. and

Value

An S3 object of class prevalence with the following attributes:

estimates

Estimated prevalence at the index date for each of the years in num_years_to_estimate.

simulated

A list containing items related to the simulation of prevalence contributions, see prevalence_simulated

counted

Contributions to prevalence from each of the supplied registry years, see prevalence_counted.

start_date

The starting date of the registry data included in the estimation.

index_date

The index date at which the point prevalence was calculated for.

known_inc_rate

The known incidence rate for years included in the registry.

nregyears

Number of years of registry data that were used.

nbootstraps

The number of bootstrapped survival models fitted during the calculation.

pval

The p-value resulting from the chi-square test between the simulated and counted prevalent cases for the years of registry data available.

The Surv object used as the response in the survival modeling.

means

The covariate means from the data.

Details

The most important parameter is num_years_to_estimate, which governs the number of previous years of data to use when estimating the prevalence at the index date. If this parameter is greater than the number of years of known incident cases available in the supplied registry data (specified with argument num_registry_years), then the remaining num_years_to_estimate - num_registry_years years of incident data will be simulated using Monte Carlo simulation.

The larger num_years_to_estimate, the more accurate the prevalence estimate will be, provided an adequate survival model can be fitted to the registry data. It is therefore important to provide as much clean registry data as possible.

Simulated cases are marked with age and sex to enable agreement with population survival data where a cure model is used, and calculation of the posterior distributions of each.

References

Crouch, Simon, et al. "Determining disease prevalence from incidence and survival using simulation techniques." Cancer epidemiology 38.2 (2014): 193-199.

Examples

Run this code

# NOT RUN {
data(prevsim)

# }
# NOT RUN {
prevalence(Surv(time, status) ~ age(age) + sex(sex) + entry(entrydate) + event(eventdate),
           data=prevsim, num_years_to_estimate = c(5, 10), population_size=1e6,
           index_date = '2013-09-01', num_reg_years = 8,
           cure = 5)

prevalence(Surv(time, status) ~ age(age) + sex(sex) + entry(entrydate) + event(eventdate),
           data=prevsim, num_years_to_estimate = 5, population_size=1e6)

# Run on multiple cores
prevalence(Surv(time, status) ~ age(age) + sex(sex) + entry(entrydate) + event(eventdate),
           data=prevsim, num_years_to_estimate = c(3,5,7), population_size=1e6, n_cores=4)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab