simulateCRT generates simulated data for a cluster randomized trial (CRT) with geographic spillover between arms.
simulateCRT(
trial = NULL,
effect = 0,
outcome0 = NULL,
generateBaseline = TRUE,
matchedPair = TRUE,
scale = "proportion",
baselineNumerator = "base_num",
baselineDenominator = "base_denom",
denominator = NULL,
ICC_inp = NULL,
kernels = 200,
sigma_m = NULL,
spillover_interval = NULL,
tol = 0.005
)A list of class "CRTsp" containing the following components:
geom_full | list: | summary statistics describing the site cluster assignments, and randomization |
design | list: | values of input parameters to the design |
trial | data frame: | rows correspond to geolocated points, as follows: |
x | numeric vector: x-coordinates of locations | |
y | numeric vector: y-coordinates of locations | |
cluster | factor: assignments to cluster of each location | |
arm | factor: assignments to control or intervention for each location | |
nearestDiscord | numeric vector: signed Euclidean distance to nearest discordant location (km) | |
propensity | numeric vector: propensity for each location | |
base_denom | numeric vector: denominator for baseline | |
base_num | numeric vector: numerator for baseline | |
denom | numeric vector: denominator for the outcome | |
num | numeric vector: numerator for the outcome | |
... | other objects included in the input "CRTsp" object
or data.frame |
an object of class "CRTsp" or a data frame containing locations in (x,y) coordinates, cluster
assignments (factor cluster), and arm assignments (factor arm). Each location may also be
assigned a propensity (see details).
numeric. The simulated effect size (defaults to 0)
numeric. The anticipated value of the outcome in the absence of intervention
logical. If TRUE then baseline data and the propensity will be simulated
logical. If TRUE then the function tries to carry out randomization
using pair-matching on the baseline data (see details)
measurement scale of the outcome. Options are: 'proportion' (the default); 'count'; 'continuous'.
optional name of numerator variable for pre-existing baseline data
optional name of denominator variable for pre-existing baseline data
optional name of denominator variable for the outcome
numeric. Target intra cluster correlation, provided as input when baseline data are to be simulated
number of kernels used to generate a de novo propensity
numeric. standard deviation of the normal kernel measuring spatial smoothing leading to spillover
numeric. input spillover interval
numeric. tolerance of output ICC
Synthetic data are generated by sampling around the values of
variable propensity, which is a numerical vector
(taking positive values) of length equal to the number of locations.
There are three ways in which propensity can arise:
propensity can be provided as part of the input trial object.
Baseline numerators and denominators (values of baselineNumerator
and baselineDenominator may be provided.
propensity is then generated as the numerator:denominator ratio
for each location in the input object
Otherwise propensity is generated using a 2D Normal
kernel density. The OOR::StoSOO
is used to achieve an intra-cluster correlation coefficient (ICC) that approximates
the value of 'ICC_inp' by searching for an appropriate value of the kernel bandwidth.
num[i], the synthetic outcome for location i
is simulated with expectation:
$$E(num[i]) = outcome0[i] * propensity[i] * denom[i] * (1 - effect*I[i])/mean(outcome0[] * propensity[])$$
The sampling distribution of num[i] depends on the value of scale as follows:
scale=’continuous’: Values of num are sampled from a
Normal distributions with means E(num[i])
and variance determined by the fitting to ICC_inp.
scale=’count’: Simulated events are allocated to locations via multivariate hypergeometric distributions
parameterised with E(num[i]).
scale=’proportion’: Simulated events are allocated to locations via multinomial distributions
parameterised with E(num[i]).
denominator may specify a vector of numeric (non-zero) values
in the input "CRTsp" or data.frame which is returned
as variable denom. It acts as a scale-factor for continuous outcomes, rate-multiplier
for counts, or denominator for proportions. For discrete data all values of denom
must be > 0.5 and are rounded to the nearest integer in calculations of num.
By default, denom is generated as a vector of ones, leading to simulation of
dichotomous outcomes if scale=’proportion’.
If baseline numerators and denominators are provided then the output vectors
base_denom and base_num are set to the input values. If baseline numerators and denominators
are not provided then the synthetic baseline data are generated by sampling around propensity in the same
way as the outcome data, but with the effect size set to zero.
If matchedPair is TRUE then pair-matching on the baseline data will be used in randomization providing
there are an even number of clusters. If there are an odd number of clusters then matched pairs are not generated and
an unmatched randomization is output.
Either sigma_m or spillover_interval must be provided. If both are provided then
the value of sigma_m is overwritten
by the standard deviation implicit in the value of spillover_interval.
Spillover is simulated as arising from a diffusion-like process.
For further details see Multerer (2021)
{smalltrial <- readdata('smalltrial.csv')
simulation <- simulateCRT(smalltrial,
effect = 0.25,
ICC_inp = 0.05,
outcome0 = 0.5,
matchedPair = FALSE,
scale = 'proportion',
sigma_m = 0.6,
tol = 0.05)
summary(simulation)
}
Run the code above in your browser using DataLab