This function is called by sim.survdata
and is not intended to be used by itself.
generate.lm(baseline, X = NULL, N = 1000, type = "none",
beta = NULL, xvars = 3, mu = 0, sd = 1, censor = 0.1)
The baseline hazard, cumulative hazard, survival, failure PDF, and failure CDF as output by baseline.build
A user-specified data frame containing the covariates that condition duration. If NULL
, covariates are generated from
normal distributions with means given by the mu
argument and standard deviations given by the sd
argument
Number of observations in each generated data frame
If "none" (the default) data are generated with no time-varying covariates or coefficients. If "tvc", data are generated with time-varying covariates, and if "tvbeta" data are generated with time-varying coefficients (see details)
A user-specified vector containing the coefficients that for the linear part of the duration model. If NULL
, coefficients are generated from
normal distributions with means of 0 and standard deviations of 0.1
The number of covariates to generate. Ignored if X
is not NULL
If scalar, all covariates are generated to have means equal to this scalar. If a vector, it specifies the mean of each covariate separately,
and it must be equal in length to xvars
. Ignored if X
is not NULL
If scalar, all covariates are generated to have standard deviations equal to this scalar. If a vector, it specifies the standard deviation
of each covariate separately, and it must be equal in length to xvars
. Ignored if X
is not NULL
The proportion of observations to designate as being right-censored
Returns a list with the following components:
data |
The simulated data frame, including the simulated durations, the censoring variable, and covariates |
beta |
The coefficients, varying over time if type is "tvbeta" |
XB |
The linear predictor for each observation |
exp.XB |
The exponentiated linear predictor for each observation |
survmat |
An (N x T ) matrix containing the individual survivor function at
time t for the individual represented by row n |
tvc |
A logical value indicating whether or not the data includes time-varying covariates |
If type="none"
then the function generates idiosyncratic survival functions for each observation via proportional hazards: first the
linear predictor is calculated from the X variables and beta coefficients, then the linear predictor is exponentiated and set as the exponent of the
baseline survivor function. For each individual observation's survival function, a duration is drawn by drawing a single random number on U[0,1]
and finding the time point at which the survival function first decreases past this value. See Harden and Kropko (2018) for a more detailed description
of this algorithm.
If type="tvc"
, this function cannot accept user-supplied data for the covariates, as a time-varying covariate is expressed over time frames
which themselves convey part of the variation of the durations, and we are generating these durations. If user-supplied X data is provided, the
function passes a warning and generates random data instead as if X=NULL
. Durations are drawn again using proportional hazards, and are passed
to the permalgorithm
function in the PermAlgo
package to generate the time-varying data structure (Sylvestre and Abrahamowicz 2008).
If type="tvbeta"
the first coefficient, whether coefficients are user-supplied or randomly generated, is interacted with the natural log of
the time counter from 1 to T
(the maximum time point for the baseline
functions). Durations are generated via proportional hazards,
and coefficients are saved as a matrix to illustrate their dependence on time.
Harden, J. J. and Kropko, J. (2018). Simulating Duration Data for the Cox Model. Political Science Research and Methods https://doi.org/10.1017/psrm.2018.19
Sylvestre M.-P., Abrahamowicz M. (2008) Comparison of algorithms to generate event times conditional on time-dependent covariates. Statistics in Medicine 27(14):2618<U+2013>34.
# NOT RUN {
baseline <- baseline.build(T=100, knots=8, spline=TRUE)
simdata <- generate.lm(baseline, N=1000, xvars=5, mu=0, sd=1, type="none", censor=.1)
summary(simdata$data)
simdata <- generate.lm(baseline, N=1000, xvars=5, mu=0, sd=1, type="tvc", censor=.1)
summary(simdata$data)
simdata <- generate.lm(baseline, N=1000, xvars=5, mu=0, sd=1, type="tvbeta", censor=.1)
simdata$beta
# }
Run the code above in your browser using DataLab