Get data.frame for Discrete Time Survival Models

Function used to get data.frame with weights for a static fit for survivals.

get_survival_case_weights_and_data(formula, data, by, max_T, id,
  init_weights, risk_obj, use_weights = T, is_for_discrete_model = T,
  c_outcome = "Y", c_weights = "weights", c_end_t = "t")

coxph like formula with Surv(tstart, tstop, event) on the left hand site of ~.


data.frame or environment containing the outcome and covariates.


interval length of the bins in which parameters are fixed.


end of the last interval interval.


vector of ids for each row of the in the design matrix.


weights for the rows in data. Useful e.g., with skewed sampling.


a pre-computed result from a get_risk_obj. Will be used to skip some computations.


TRUE if weights should be used. See details.


TRUE if the model is for a discrete hazard model is used like the logistic model.

c_outcome, c_weights, c_end_t

alternative names to use for the added columns described in the return section. Useful if you already have a column named Y, t or weights.


This function is used to get the data.frame for e.g. a glm fit that is comparable to a ddhazard fit in the sense that it is a static version. For example, say that we bin our time periods into (0,1], (1,2] and (2,3]. Next, consider an individual who dies at time 2.5. He should be a control in the the first two bins and should be a case in the last bin. Thus the rows in the final data frame for this individual is c(Y = 1, ..., weights = 1) and c(Y = 0, ..., weights = 2) where Y is the outcome, ... is the covariates and weights is the weights for the regression. Consider another individual who does not die and we observe him for all three periods. Thus, he will yield one row with c(Y = 0, ..., weights = 3).

This function use similar logic as the ddhazard for individuals with time varying covariates (see the vignette vignette("ddhazard", "dynamichazard") for details).

If use_weights = FALSE then the two previously mentioned individuals will yield three rows each. The first individual will have c(Y = 0, t = 1, ..., weights = 1), c(Y = 0, t = 2, ..., weights = 1), c(Y = 1, t = 3, ..., weights = 1) while the latter will have three rows c(Y = 0, t = 1, ..., weights = 1), c(Y = 0, t = 2, ..., weights = 1), c(Y = 0, t = 3, ..., weights = 1). This kind of data frame is useful if you want to make a fit with e.g. gam function in the mgcv package as described en Tutz et. al (2016).


Returns a data.frame where the following is added (column names will differ if you specified them): column Y for the binary outcome, column weights for weights of each row and additional rows if applicable. A column t is added for the stop time of the bin if use_weights = FALSE. An element Y with the used Surv object is added if is_for_discrete_model = FALSE.


Tutz, Gerhard, and Matthias Schmid. Nonparametric Modeling and Smooth Effects. Modeling Discrete Time-to-Event Data. Springer International Publishing, 2016. 105-127.

See Also

ddhazard, static_glm

  • get_survival_case_weights_and_data
# small toy example with time-varying covariates
dat <- data.frame(
 id     = c(   1,    1, 2,     2),
 tstart = c(   0,    4, 0,     2),
 tstop  = c(   4,    6, 2,     6),
 event  = c(   0,    1, 0,     0),
 x1     = c(1.09, 1.29, 0, -1.16))

 Surv(tstart, tstop, event) ~ x1, dat, by = 1, id = dat$id)$X
 Surv(tstart, tstop, event) ~ x1, dat, by = 1, id = dat$id,
 use_weights = FALSE)$X

# }
Documentation reproduced from package dynamichazard, version 0.6.5, License: GPL-2

Community examples

Looks like there are no examples yet.