getData: Data Generation

Description

Generates continuous or binary outcomes given patients' covariates, the underlying model and the randomization procedure.

Usage

getData(n, cov_num, level_num, pr, type, beta, 
          mu1, mu2, sigma = 1, method = HuHuCAR, …)

Arguments

the number of patients.

cov_num

the number of covariates.

level_num

the vector of level numbers for each covariate. Hence the length of level_num should be equal to the number of covariates.

the vector of probabilities. Under the assumption of independence between covariates, pr is a vector containing probabilities for each level of each covariate. The length of pr should correspond to number of all levels, and the vector sum of pr should be equal to cov_num.

type

the type of models when generating data. Optional input: linear or logit.

beta

the vector of coefficients of covariates. The length of beta must correspond to cov_num.

mu1,mu2

main effects of treatment 1 and treatment 2.

sigma

the error variance for linear model. The default is 1. It is only used when type is linear.

method

the randomization method to be used in allocating patients. The default randomization method HuHuCAR uses Hu and Hu's general covariate-adaptive randomization; the alternatives are PocSimMIN, StrBCD, StrPBR, DoptBCD, and AdjBCD.

…

arguments to be passed to methods. These depends on the method used and the following arguments are accepted:

omega: the vector of weights at the overall, within-stratum, and marginal levels. It is required that at least one element is larger than 0. Note that omega is only needed when HuHuCAR is to be used.
weight: the vector of weights for maginal imbalances. It is required that at least one element is larger than 0. Note that weight is only needed when PocSimMIN is to be used.
p: the probability of assigning one patient to treatment 1. p is required to be larger than 1/2 to obtain balance. Note that p is only needed when "HuHuCAR", "PocSimMIN" and "StrBCD" are to be used.
a: a design parameter. As a goes to $\infty$, the design becomes more deterministic.
bsize: the block size for stratified randomization. It is required to be a multiple of 2. Notice that bsize is only needed when "StrPBR" is to be used.

Value

getData returns a size $cov_num+2 \times n$ dataframe. The first cov_num rows represent patients' profile. The next row consists of patients' assignments and the final row consists of generated outcomes.

Details

To generate continuous outcomes, we use the linear model:$$y_i = \mu_j+x_i^T\beta+\epsilon_i,$$

to generate binary outcomes, we use the logit link function:$$P(y_i=1) = \frac{exp\{\mu_j+x_i^T\beta \}}{1+exp \{\mu_j+x_i^T\beta }$$,

where $j$ indicates patient $i$ belongs to treatment $j$.

Examples

Run this code

# NOT RUN {
#Parameters' Setting
set.seed(100)
n = 1000
cov_num = 5
level_num = c(2,2,2,2,2)
beta = c(1,4,3,2,5)
mu1 = 0
mu2 = 0
sigma = 1
type = "linear"
method = HuHuCAR
p = 0.85
omega = c(0.1, 0.1, rep(0.8 / 5, times = 5))
pr = rep(0.5,10)

#Data Generation
dataH = getData(n, cov_num,level_num, pr, type, beta,
                mu1, mu2, sigma, HuHuCAR, omega, p)
dataH[1:(cov_num+2),1:5]
# }

Run the code above in your browser using DataLab