get_survival_case_weigths_and_data
Static GLM fit for survival models
Function used to get design matrix and weights for a static fit for survivals models where observations are binned into intervals
Usage
get_survival_case_weigths_and_data(formula, data, by, max_T, id, init_weights, risk_obj, use_weights = T, is_for_discrete_model = T, c_outcome = "Y", c_weights = "weights", c_end_t = "t")
Arguments
 formula
coxph
like formula withSurv(tstart, tstop, event)
on the left hand site of~
 data
 Data frame or environment containing the outcome and covariates
 by
 Length of each intervals that cases are binned into
 max_T
 The end time of the last bin
 id
 The id for each row in
data
. This is important when variables are time varying  init_weights
 Weights for the rows
data
. Useful with skewed sampling and will be used when computing the final weights  risk_obj
 A precomputed result from a
get_risk_obj
. Will be used to skip some computations  use_weights
TRUE
if weights should be used. See details is_for_discrete_model
TRUE
if the model is for a discrete hazard model like the logistic model. Affects how deaths are included when individuals have time varying coefficients c_outcome, c_weights, c_end_t
 Alternative names to use for the added columns described in the return section. Useful if you already have a column named
Y
,t
orweights
Details
This function is used to get the data frame for e.g. a glm
fit that is comparable to a ddhazard
fit in the sense that it is a static version. For example, say that we bin our time periods into (0,1]
, (1,2]
and (2,3]
. Next, consider an individual who dies at time 2.5. He should be a control in the the first two bins and should be a case in the last bin. Thus the rows in the final data frame for this individual is c(Y = 1, ..., weights = 1)
and c(Y = 0, ..., weights = 2)
where Y
is the outcome, ...
is the covariates and weights
is the weights for the regression. Consider another individual who does not die and we observe him for all three periods. Thus, he will yield one row with c(Y = 0, ..., weights = 3)
This function use similar logic as the ddhazard
for individuals with time varying covariates (see the vignette "ddhazard" for details)
If use_weights = FALSE
then the two individuals will yield three rows each. The first individual will have c(Y = 0, t = 1, ..., weights = 1)
, c(Y = 0, t = 2, ..., weights = 1)
, c(Y = 1, t = 3, ..., weights = 1)
while the latter will have three rows c(Y = 0, t = 1, ..., weights = 1)
, c(Y = 0, t = 2, ..., weights = 1)
, c(Y = 0, t = 3, ..., weights = 1)
. This kind of data frame is useful if you want to make a fit with e.g. gam
function in the mgcv
package as described en Tutz et. al (2016) (see reference)
Value

Returns a data frame with the design matrix from the formula where the following is added (column names will differ if you specified them): column
Y
for the binary outcome, column weights
for weights of each row and additional rows if applicable. A column t
is added for the stop time of the bin if use_weights = FALSE
References
Tutz, Gerhard, and Matthias Schmid. Nonparametric Modeling and Smooth Effects. Modeling Discrete TimetoEvent Data. Springer International Publishing, 2016. 105127.