cvrcov
performs covariate-by-covariate constrained randomization for cluster randomized
trials (CRTs), especially suited for CRTs with a small number of clusters. In constrained randomization,
a randomization scheme is randomly sampled from a subset of all possible randomization schemes
based on the constraints on each covariate.
The cvrcov
function enumerates all randomization schemes or simulates a fixed size of unique randomization schemes as specified by the user.
A subset of the randomization schemes is chosen based on user-specified covariate-by-covariate constraints. cvrcov
treats the subset as the constrained space
of randomization schemes and samples one scheme from the constrained space as the final chosen scheme.
cvrcov(
clustername = NULL,
x,
categorical = NULL,
constraints,
ntotal_cluster,
ntrt_cluster,
size = 50000,
seed = NULL,
nosim = FALSE,
savedata = NULL,
check_validity = FALSE,
samearmhi = 0.75,
samearmlo = 0.25
)
allocation
the allocation scheme from constrained randomization
assignment_message
the statement about how many clusters to be randomized to the intervention and the control arms respectively
scheme_message
the statement about how to get the whole randomization space to use in constrained randomization
data_CR
the data frame containing the allocation scheme, the clustername
, and the original data frame of covariates
baseline_table
the descriptive statistics for all the variables by the two arms from the selected scheme
cluster_coincidence
cluster coincidence matrix
cluster_coin_des
cluster coincidence descriptive
clusters_always_pair
pairs of clusters always allocated to the same arm.
clusters_always_not_pair
pairs of clusters always allocated to different arms.
clusters_high_pair
pairs of clusters randomized to the same arm at least samearmhi
of the time.
clusters_low_pair
pairs of clusters randomized to the same arm at most samearmlo
of the time.
overall_allocations
frequency of acceptable overall allocations.
overall_summary
summary of covariates with constraints in the constrained space
a vector specifying the identification variable of the cluster. If no cluster identification variable is specified, the default is to label the clusters based on the order in which they appear.
a data frame specifying the values of cluster-level covariates to balance. With K covariates and n clusters, it will be dimension of n
by K
.
a vector specifying categorical (including binary) variables. This can be names of the columns or number indexes of columns, but cannot be both. Suppose there are p
categories for a categorical variable, cvcrand
function creates p-1
dummy variables and drops the reference level if the variable is specified as a factor. Otherwise, the first level in the alphanumerical order will be dropped. The results are sensitive to which level is excluded. If the user wants to specify a different level to drop for a p
-level categorical variable, the user can create p-1
dummy variables and these can instead be supplied as covariates to the cvcrand
function. Then, the user needs to specify the dummy variables created to be categorical
when running cvcrand
. In addition, the user could also set the variable as a factor with the specific reference level. If the weights
option is used, the weights for a categorical variable will be replicated on all the dummy variables created.
a vector of user-specified constraints for all covariates. "any"
means no constraints. If not "any"
, the first character letter of "m"
denotes absolute mean difference, and "s"
means absolute sum difference. If the second character is "f"
, the previous metric is constrained to be smaller or equal to the fraction with the number followed of the overall mean for "m"
or mean arm total for "s"
. If not "f"
at the second character, the metric is just constrained to be smaller or equal to the value following letter(s).
the total number of clusters to be randomized. It must be a positive integer and equal to the number of rows of the data.
the number of clusters that the researcher wants to assign to the treatment arm. It must be a positive integer less than the total number of clusters.
number of randomization schemes to simulate if the number of all possible randomization schemes is over size
. Its default is 50,000
, and must be a positive integer. It can be overriden by the nosim
option.
seed for simulation and random sampling. It is needed so that the randomization can be replicated. Its default is 12345
.
if TRUE, it overrides the default procedure of simulating when the number of all possible randomization schemes is over size
, and the program enumerates all randomization schemes. Note: this may consume a lot of memory and cause R to crash
saves the data set of the constrained randomization space in a csv file if specified by savedata
. The first column of the csv file is an indicator variable of the final randomization scheme in the constrained space. The constrained randomization space will be needed for analysis after the cluster randomized trial is completed if the clustered permutation test is used.
boolean argument to check the randomization validity or not
clusters assigned to the same arm as least this often are displayed. The default is 0.75
.
clusters assigned to the same arm at most this often are displayed. The default is 0.25
.
Hengshi Yu <hengshi@umich.edu>, Fan Li <fan.f.li@yale.edu>, John A. Gallis <john.gallis@duke.edu>, Elizabeth L. Turner <liz.turner@duke.edu>
Raab, G.M. and Butcher, I., 2001. Balance in cluster randomized trials. Statistics in medicine, 20(3), pp.351-365.
Li, F., Lokhnygina, Y., Murray, D.M., Heagerty, P.J. and DeLong, E.R., 2016. An evaluation of constrained randomization for the design and analysis of group randomized trials. Statistics in medicine, 35(10), pp.1565-1579.
Li, F., Turner, E. L., Heagerty, P. J., Murray, D. M., Vollmer, W. M., & DeLong, E. R. (2017). An evaluation of constrained randomization for the design and analysis of group randomized trials with binary outcomes. Statistics in medicine, 36(24), 3791-3806.
Gallis, J.A., Li, F., Yu, H. and Turner, E.L., 2018. cvcrand and cptest: Commands for efficient design and analysis of cluster randomized trials using constrained randomization and permutation tests. The Stata Journal, 18(2), pp.357-378.
Dickinson, L. M., Beaty, B., Fox, C., Pace, W., Dickinson, W. P., Emsermann, C., & Kempe, A. (2015). Pragmatic cluster randomized trials using covariate constrained randomization: A method for practice-based research networks (PBRNs). The Journal of the American Board of Family Medicine, 28(5), 663-672.
Bailey, R.A. and Rowley, C.A., 1987. Valid randomization. Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 410(1838), pp.105-124.
Greene, E.J., 2017. A SAS macro for covariate-constrained randomization of general cluster-randomized and unstratified designs. Journal of statistical software, 77(CS1).
# cvrcov example
Dickinson_design_numeric <- Dickinson_design
Dickinson_design_numeric$location = (Dickinson_design$location == "Rural") * 1
Design_cov_result <- cvrcov(clustername = Dickinson_design_numeric$county,
x = data.frame(Dickinson_design_numeric[ , c("location", "inciis",
"uptodateonimmunizations", "hispanic", "income")]),
ntotal_cluster = 16,
ntrt_cluster = 8,
constraints = c("s5", "mf.5", "any", "mf0.2", "mf0.2"),
categorical = c("location"),
###### Option to save the constrained space ######
# savedata = "dickinson_cov_constrained.csv",
seed = 12345,
check_validity = TRUE)
Run the code above in your browser using DataCamp Workspace