survtab_ag: Estimate Survival Time Functions

Description

Estimate survival time functions (survival, relative/net survival, CIFs) using aggregated data (survtab_ag) or Lexis data (survtab).

Usage

survtab_ag(formula = NULL, data, adjust = NULL, weights = NULL, surv.breaks = NULL, n = "at.risk", d = "from0to1", n.cens = "from0to0", pyrs = "pyrs", d.exp = "d.exp", n.pp = NULL, d.pp = "d.pp", d.pp.2 = "d.pp.2", n.cens.pp = "n.cens.pp", pyrs.pp = "pyrs.pp", d.exp.pp = "d.exp.pp", surv.type = "surv.rel", surv.method = "hazard", relsurv.method = "e2", subset = NULL, conf.level = 0.95, conf.type = "log-log", verbose = FALSE)
survtab(formula, data, adjust = NULL, breaks = NULL, pophaz = NULL, weights = NULL, surv.type = "surv.rel", surv.method = "hazard", relsurv.method = "e2", subset = NULL, conf.level = 0.95, conf.type = "log-log", verbose = FALSE)

Arguments

formula

a formula object. For survtab_ag, the response must be the time scale to compute survival time function estimates over, e.g. fot ~ sex + adjust(agegr). For survtab it can also be the just e.g. fot ~ sex + adjust(agegr), whereupon it is assumed that lex.Xst in your data is the status variable in the intended format. To be explicit, use Surv containing the survival time scale and the event indicator as the right-hand-side, e.g. Surv(fot, lex.Xst) ~ sex. Stratifying variables can be included on the right-hand side separated by '+'. May contain usage of adjust() --- see Details.

data

for survtab_ag, a data set of aggregated counts, subject-times, etc., as created by aggre; for pre-aggregated data see as.aggre. For survtab, a Lexis object.

adjust

can be used as an alternative to passing variables to argument formula within a call to adjust(); e.g. adjust = "agegr". Flexible input.

weights

typically a list of weights or a character string specifying an age group standardization scheme; see the dedicated help page and examples.

surv.breaks

(survtab_ag only) a vector of breaks on the survival time scale. Only used to compute estimates using a subset of the intervals in data or larger intervals than in data. E.g. one might use surv.breaks = 0:5 when the aggregated data has intervals with the breaks seq(0, 10, 1/12).

variable containing counts of subjects at-risk at the start of a time interval; e.g. n = "at.risk". Required when surv.method = "lifetable". Flexible input.

variable(s) containing counts of subjects experiencing an event. With only one type of event, e.g. d = "deaths". With multiple types of events (for CIF or cause-specific survival estimation), supply e.g. d = c("canD", "othD"). If the survival time function to be estimated does not use multiple types of events, supplying more than one variable to d simply causes the variables to be added together. Always required. Flexible input.

n.cens

variable containing counts of subjects censored during a survival time interval; E.g. n.cens = "alive". Required when surv.method = "lifetable". Flexible input.

pyrs

variable containing total subject-time accumulated within a survival time interval; E.g. pyrs = "pyrs". Required when surv.method = "hazard". Flexible input.

d.exp

variable denoting total "expected numbers of events" (typically computed pyrs * pop.haz, where pop.haz is the expected hazard level) accumulated within a survival time interval; E.g. pyrs = "pyrs". Required when computing EdererII relative survivals or CIFs based on excess counts of events. Flexible input.

n.pp

variable containing total Pohar-Perme weighted counts of subjects at risk in an interval, supplied as argument n is supplied. Computed originally on the subject level as analogous to pp * as.integer(status == "at-risk"). Required when relsurv.method = "pp". Flexible input.

d.pp

variable(s) containing total Pohar-Perme weighted counts of events, supplied as argument d is supplied. Computed originally on the subject level as analogous to pp * as.integer(status == some_event). Required when relsurv.method = "pp". Flexible input.

d.pp.2

variable(s) containing total Pohar-Perme "double-weighted" counts of events, supplied as argument d is supplied. Computed originally on the subject level as analogous to pp * pp * as.integer(status == some_event). Required when relsurv.method = "pp". Flexible input.

n.cens.pp

variable containing total Pohar-Perme weighted counts censorings, supplied as argument n.cens is supplied. Computed originally on the subject level as analogous to pp * as.integer(status == "censored"). Required when relsurv.method = "pp". Flexible input.

pyrs.pp

variable containing total Pohar-Perme weighted subject-times, supplied as argument pyrs is supplied. Computed originally on the subject level as analogous to pp * pyrs. Required when relsurv.method = "pp". Flexible input.

d.exp.pp

variable containing total Pohar-Perme weighted counts of excess events, supplied as argument pyrs is supplied. Computed originally on the subject level as analogous to pp * d.exp. Required when relsurv.method = "pp". Flexible input.

surv.type

one of 'surv.obs', 'surv.cause', 'surv.rel', 'cif.obs' or 'cif.rel'; defines what kind of survival time function(s) is/are estimated; see Details

surv.method

either 'lifetable' or 'hazard'; determines the method of calculating survival time functions, where the former computes ratios such as p = d/(n - n.cens) and the latter utilizes subject-times (typically person-years) for hazard estimates such as d/pyrs which are used to compute survival time function estimates. The former method requires argument n.cens and the latter argument pyrs to be supplied.

relsurv.method

either 'e2' or 'pp'; defines whether to compute relative survival using the EdererII method or using Pohar-Perme weighting; ignored if surv.type != "surv.rel"

subset

a logical condition; e.g. subset = sex == 1; subsets the data before computations

conf.level

confidence level used in confidence intervals; e.g. 0.95 for 95 percent confidence intervals

conf.type

character string; must be one of "plain", "log-log" and "log"; defines the transformation used on the survival time function to yield confidence intervals via the delta method

verbose

logical; if TRUE, the function is chatty and returns some messages and timings along the process

breaks

(survtab only) a named list of breaks, e.g. list(FUT = 0:5). If data is not split in advance, breaks must at the very least contain a vector of breaks to split the survival time scale (mentioned in argument formula). If data has already been split (using e.g. splitMulti) along at least the used survival time scale, this may be NULL.

pophaz

(survtab only) a data.frame containing expected hazards for the event of interest to occur. See the dedicated help page. Required when surv.type = "surv.rel" or "cif.rel". pophaz must contain one column named "haz", and any number of other columns identifying levels of variables to do a merge with split data within survtab. Some columns may be time scales, which will allow for the expected hazard to vary by e.g. calendar time and age.

Value

Returns a table of life time function values and other information with survival intervals as rows. Returns some of the following estimates of survival time functions:

surv.obs - observed (raw) survival
CIF_k - cumulative incidence function for cause k
CIF.rel - cumulative incidence function using excess cases
r.e2 - relative survival, EdererII
r.pp - relative survival, Pohar-Perme weighted

The suffix .as implies adjusted estimates, and .lo and .hi imply lower and upper confidence limits, respectively. The prefix SE. stands for standard error.

Functions

survtab: survtab

Details

Basics

survtab_ag computes estimates of survival time functions using pre-aggregated data. When using subject-level data directly, use survtab. For aggregating data, see lexpand and aggre. Data sets can be coerced into Lexis objects using Lexis.

By default survtab_ag makes use of the exact same breaks that were used in splitting the original data (with e.g. lexpand), so it is not necessary to specify any surv.breaks. If specified, the surv.breaks must be a subset of the pertinent pre-existing breaks. Interval lengths (delta in output) are also calculated based on existing breaks or surv.breaks, so the upper limit of the breaks should therefore be meaningful and never e.g. Inf.

survtab may be a split or unsplit Lexis data set, but it is recommended to supply the breaks argument anyway.

if surv.type = 'surv.obs', only 'raw' observed survival is calculated over the chosen time intervals. With surv.type = 'surv.rel', also relative survival estimates are supplied in addition to observed survival figures.

surv.type = 'cif.obs' requests cumulative incidence functions (CIF) to be estimated, where all unique events (supplied via d) are seen as competing risks. CIFs are estimated for each competing risk based on a survival-interval-specific proportional hazards assumption as described by Chiang (1968). With surv.type = 'cif.rel', a CIF is estimated with using excess cases as the ''cause-specific'' cases.

if surv.type = 'surv.cause', cause-specific survivals are estimated separately for each unique value of event.values.

The vignette survtab_examples has some practical examples.

Relative / net survival When surv.type = 'surv.rel', the user can choose relsurv.method = 'pp', whereupon Pohar-Perme weighting is used. By default relsurv.method = 'e2'.

Adjusted estimates

Adjusted estimates in this context mean computing estimates separately by the levels of adjusting variables and returning weighted averages of the estimates. For example, computing estimates separately by age groups and returning a weighted average estimate (age-adjusted estimate).

Adjusting requires specification of both the adjusting variables and the weights for all the levels of the adjusting variables. The former can be accomplished by using adjust() with the argument formula, or by supplying variables directly to argument adjust. E.g. the following are all equivalent (using survtab_ag):

formula = fot ~ sex + adjust(agegr, area)

formula = fot ~ sex and adjust = c("agegr", "area")

formula = fot ~ sex and adjust = list(agegr, area)

When using survtab, the response must be a Surv object, e.g. Surv(time = fot, event = lex.Xst), but otherwise the syntax is the same.

The adjusting variables must match with the variable names in the argument weights, which may be supplied as a list or a data.frame. The former can be done by e.g.

weights = list(agegr = VEC1, area = VEC2),

where VEC1 and VEC2 are vectors of weights (which do not have to add up to one). When passing a list of weights, the order of the weights must match with the order of the levels of the variable: For factor variables, they must correspond to the factor's levels. Otherwise they must match to the sorted levels of the variable, i.e. if variable agegr has three levels c(1, 2, 3), the weights in e.g. list(agegr = c(0.1, 0.4, 0.5)) would pass the weight 0.1 for level 1 and so on, regardless of the order of values in the data.

See survtab_examples for an example of using a data.frame to pass weights.

Period analysis and other data selection schemes

If one wishes to calculate e.g. period analysis (delayed entry estimates), one should limit the data when/before supplying to this function. See e.g. lexpand.

References

Perme, Maja Pohar, Janez Stare, and Jacques Estève. "On estimation in relative survival." Biometrics 68.1 (2012): 113-120.

Hakulinen, Timo, Karri Seppa, and Paul C. Lambert. "Choosing the relative survival method for cancer survival estimation." European Journal of Cancer 47.14 (2011): 2202-2210. Seppa, Karri, Timo Hakulinen, and Arun Pokhrel. "Choosing the net survival method for cancer survival estimation." European Journal of Cancer (2013).

CHIANG, Chin Long. Introduction to stochastic processes in biostatistics. 1968.

Examples

Run this code

## see more examples with explanations in vignette("survtab_examples")

#### survtab_ag usage

data(sire)
## prepare data for e.g. 5-year "period analysis" for 2008-2012
## note: sire is a simulated cohort integrated into popEpi.
BL <- list(fot=seq(0, 5, by = 1/12),
           per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = status %in% 1:2,
             breaks = BL,
             pophaz = popmort,
             aggre = list(fot))
             
## calculate relative EdererII period method 
st <- survtab_ag(fot ~ 1, data = x)

#### survtab usage
library(Epi)
library(survival)

## NOTE: recommended to use factor status variable
x <- Lexis(entry = list(FUT = 0, AGE = dg_age, CAL = get.yrs(dg_date)), 
           exit = list(CAL = get.yrs(ex_date)), 
           data = sire[sire$dg_date < sire$ex_date, ],
           exit.status = factor(status, levels = 0:2, 
                                labels = c("alive", "canD", "othD")), 
           merge = TRUE)

## phony group variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)

## Not run: 
# ## observed survival. explicit supplying of status:
# st <- survtab(Surv(time = FUT, event = lex.Xst) ~ group, data = x, 
#               surv.type = "surv.obs",
#               breaks = list(FUT = seq(0, 5, 1/12)))
# ## this assumes the status is lex.Xst (right 99.9 % of the time)
# st <- survtab(FUT ~ group, data = x, 
#               surv.type = "surv.obs",
#               breaks = list(FUT = seq(0, 5, 1/12)))
# 
# #### using dates with survtab
# x <- Lexis(entry = list(FUT = 0L, AGE = dg_date-bi_date, CAL = dg_date),
#            exit = list(CAL = ex_date),
#            data = sire[sire$dg_date < sire$ex_date, ],
#            exit.status = factor(status, levels = 0:2, 
#                                 labels = c("alive", "canD", "othD")), 
#            merge = TRUE)
# ## phony group variable
# set.seed(1L)
# x$group <- rbinom(nrow(x), 1, 0.5)
# 
# st <- survtab(Surv(time = FUT, event = lex.Xst) ~ group, data = x, 
#               surv.type = "surv.obs",
#               breaks = list(FUT = seq(0, 5, 1/12)*365.25))    
#                   
# ## NOTE: population hazard should be reported at the same scale
# ## as time variables in your Lexis data.
# data(popmort, package = "popEpi")
# pm <- data.frame(popmort)
# names(pm) <- c("sex", "CAL", "AGE", "haz")
# ## from year to day level
# pm$haz <- pm$haz/365.25 
# pm$CAL <- as.Date(paste0(pm$CAL, "-01-01")) 
# pm$AGE <- pm$AGE*365.25 
# 
# st <- survtab(Surv(time = FUT, event = lex.Xst) ~ group, data = x, 
#               surv.type = "surv.rel", relsurv.method = "e2",
#               pophaz = pm,
#               breaks = list(FUT = seq(0, 5, 1/12)*365.25))  
# ## End(Not run)

Run the code above in your browser using DataLab