Learn R Programming

rocTree (version 1.1.1)

simu: Function to generate simulated data used in the manuscript.

Description

This function is used to generate simulated data under various settings. Let \(Z\) be a \(p\)-dimensional vector of possible time-dependent covariates and \(\beta\) be the vector of regression coefficient. The survival times (\(T\)) are generated from the hazard function specified as follow:

Scenario 1.1

Proportional hazards model: $$\lambda(t|Z) = \lambda_0(t) e^{-0.5 Z_1 + 0.5 Z_2 - 0.5 Z_3 ... + 0.5 Z_{10}},$$

where \lambda_0(t) = 2t.
Scenario 1.2

Proportional hazards model with noise variable: $$\lambda(t|Z) = \lambda_0(t) e^{2Z_1 + 2Z_2 + 0Z_3 + ... + 0Z_{10}},$$

where \lambda_0(t) = 2t.
Scenario 1.3

Proportional hazards model with nonlinear covariate effects: $$\lambda(t|Z) = \lambda_0(t) e^{[2\sin(2\pi Z_1) + 2|Z_2 - 0.5|]},$$

where \lambda_0(t) = 2t.
Scenario 1.4

Accelerated failure time model: $$\log(T) = -2 + 2Z_1 + 2Z_2 + \epsilon,$$ where \(\epsilon\) follows \(N(0, 0.5^2).\)

Scenario 1.5

Generalized gamma family: $$T = e^{\sigma\omega},$$ where \(\omega = \log(Q^2 g) / Q\), \(g\) follows Gamma(\(Q^{-2}, 1\)), \(\sigma = 2Z_1, Q = 2Z_2.\)

Scenario 2.1

Dichotomous time dependent covariate with at most one change in value: $$\lambda(t|Z(t)) = \lambda_0(t)e^{2Z_1(t) + 2Z_2},$$ where \(Z_1(t)\) is the time-dependent covariate: \(Z_1(t) = \theta I(t \ge U_0) + (1 - \theta) I(t < U_0)\), ,\(\theta\) is a Bernoulli variable with equal probability, and \(U_0\) follows a uniform distribution over \([0, 1]\).

Scenario 2.2

Dichotomous time dependent covariate with multiple changes: $$\lambda(t|Z(t)) = e^{2Z_1(t) + 2Z_2},$$ where \(Z_1(t) = \theta[I(U_1\le t < U_2) + I(U_3 \le t)] + (1 - \theta)[I(t < U_1) + I(U_2\le t < U_3)]\), \(\theta\) is a Bernoulli variable with equal probability, and \(U_1\le U_2\le U_3\) are the first three terms of a stationary Poisson process with rate 10.

Scenario 2.3

Proportional hazard model with a continuous time dependent covariate: $$\lambda(t|Z(t)) = 0.1 e^{Z_1(t) + Z_2},$$ where \(Z_1(t) = kt + b\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).

Scenario 2.4

Non-proportional hazards model with a continuous time dependent covariate: $$\lambda(t|Z(t)) = 0.1 \cdot[1 + \sin\{Z_1(t) + Z_2\}],$$ where \(Z_1(t) = kt + b\), \(k\) and \(b\) follow independent uniform distributions over \([1, 2]\).

Scenario 2.5

Non-proportional hazards model with a nonlinear time dependent covariate: $$\lambda(t|Z(t)) = 0.1 \cdot[1 + \sin\{Z_1(t) + Z_2\}],$$ where \(Z_1(t) = 2kt\cdot \{I(t > 5) - 1\} + b\), \(k\) and \(b\) follow independent uniform distributions over \([1, 2]\).

The censoring times are generated from an independent uniform distribution over [0, c], where c was tuned to yield censoring percentages of 25% and 50%.

Usage

simu(n, cen, scenario, summary = FALSE)

trueHaz(dat)

trueSurv(dat)

Arguments

n

an integer value indicating the number of subjects.

cen

is a numeric value indicating the censoring percentage; three levels, 0%, 25%, 50%, are allowed.

scenario

can be either a numeric value or a character string. This indicates the simulation scenario noted above.

summary

a logical value indicating whether a brief data summary will be printed.

dat

is a data.frame prepared by simu.

Value

simu returns a data.frame. The returned data.frame consists of columns:

id

is the subject id.

Y

is the observed follow-up time.

death

is the death indicator; death = 0 if censored.

z1--z10

is the possible time-independent covariate.

k, b, U

are the latent variables used to generate $Z_1(t)$ in Scenario 2.1 -- 2.5.

The returned data.frame can be supply to trueHaz and trueSurv to generate the true cumulative hazard function and the survival function, respectively.

Examples

Run this code
# NOT RUN {
set.seed(1)
simu(10, 0.25, 1.2, TRUE)

set.seed(1)
simu(10, 0.50, 2.2, TRUE)

# }

Run the code above in your browser using DataLab