Generating Confounders
We assume that there are a total of k
confounders that are generated
from a multivariate normal distribution with equicorrelation covariance, i.e.,
$$\Sigma_{C}=\phi(\mathbf{1}\mathbf{1}^{T}-\mathbf{I})+\mathbf{I}\sigma^{2}_{C},$$
where \(\mathbf{1}\) is the column vector with all entries equal to 1,
\(\mathbf{I}\) is the identity matrix, \(\sigma^{2}_{C}\) is a constant
standard deviation for all confounders, and \(\phi\) is the covariance of
any two confounders. Therefore, our random confounders
C
follow the distribution
$$\mathbf{C}\sim N_{k}(\boldsymbol{\mu}_{C}, \Sigma_{C}).$$
We draw a total of n
samples from this multivariate normal distribution
using mvrnorm
.
Generating Bivariate Exposure
The first step when generating the bivariate exposure is to specify the
effects of the confounders C
. We control this for each exposure value
using the arguments d1_beta
and d2_beta
such that
$$E[D_{1}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D1}\mathbf{C}$$ and
$$E[D_{2}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D2}\mathbf{C}$$.
Note that by specifying d1_beta
and d2_beta
separately that the
user can control the amount of overlap in the confounders for each exposure,
and how many of the variables in C
are truly related to the exposures.
For instance to have the exposure have identical confounding effects
d1_beta
=d2_beta
, and they have separate confounding if there are
zero non-zero elements in common between d1_beta
and d2_beta
.
To generate the bivariate conditional distribution of exposures given the set
of confounders C
we have the following three methods:
"matrix_normal"
"uni_cond"
"vector_normal"
"matrix_normal" uses the function rmatnorm
to
generate all n
samples as
$$\mathbf{D}\mid\mathbf{C}\sim N_{n \times 2}(\boldsymbol{\beta}\mathbf{C}, \mathbf{I}_{n}, \Omega)$$
where \(\boldsymbol{\beta}\) is a column vector containing \(\boldsymbol{\beta}^{T}_{D1}\)
and \(\boldsymbol{\beta}^{T}_{D2}\), and \(\Omega\) is the conditional covariance matrix.
"vector_normal" simply vectorizes the matrix_normal method above to generate
a vector of length \(n \times 2\).
"uni_cond" specifies the bivariate exposure using univariate conditional
factorization, which in the case of bivariate normal results in two univariate
normal expressions.
In general, we suggest using the univariate conditional, "uni_cond", method
when generating exposures as it is substantially faster than both the
matrix normal and vector normal approaches.
Note that the options use regular expression matching and can be specified
uniquely using either "m", "u", or "v".
Marginal Covariance of Exposures
As described above the exposures are drawn conditional on the set C
,
so the marginal covariance of exposures is defined as
$$\Sigma_{D}= \boldsymbol{\beta}\Sigma_{C}\boldsymbol{\beta}^{T}+\Omega.$$
In our function we return the true marginal covariance \(\Sigma_{D}\) as well
as the true marginal correlation \(\rho_{D}\).