gen_D: Generate Bivariate Multivariate Exposure

Description

Generate exposure from a bivariate normal distribution confounded by a set of variables C=\(C1, C2).

Usage

gen_D(
  method,
  n,
  rho_cond,
  s_d1_cond,
  s_d2_cond,
  k,
  C_mu,
  C_cov,
  C_var,
  C_sigma = NULL,
  d1_beta,
  d2_beta,
  seed = NULL
)

Arguments

method

character value identifying which method to use when generating bivariate exposure. Options include "matrix_normal", "uni_cond", and "vector_normal". See details for a brief explanation of each method. uni_cond is fastest

integer value total number of units

rho_cond

scalar value identifying conditional correlation of exposures given covariates between \[0, 1\]

s_d1_cond

scalar value for conditional standard deviation of D1

s_d2_cond

scalar value for conditional standard deviation of D2

integer value determining number of covariates to generate in C.

C_mu

numeric vector of mean values for covariates. Must be same length as k

C_cov

scalar value representing constant correlation between covariates

C_var

scalar value representing constant variance of covariates

C_sigma

numeric matrix representing the covariance matrix of covariates. Default is NULL and will use C_var and C_var otherwise.

d1_beta

numeric vector of length k defining the mean of D1 with respect to the covariates

d2_beta

numeric vector of length k defining the mean of D2 with respect to the covariates

seed

integer value setting the seed of random generator to produce repeatable results. set to NULL by default

Value

D: nx2 numeric matrix of the sample values for the exposures given the set C
C: nxk numeric matrix of the sampled values for the confounding set C
D_Sigma: 2x2 numeric matrix of the true marginal covariance of exposures
rho: numeric scalar representing the true marginal correlation of exposures

Details

Generating Confounders

We assume that there are a total of k confounders that are generated from a multivariate normal distribution with equicorrelation covariance, i.e., $$\Sigma_{C}=\phi(\mathbf{1}\mathbf{1}^{T}-\mathbf{I})+\mathbf{I}\sigma^{2}_{C},$$ where $\mathbf{1}$ is the column vector with all entries equal to 1, $\mathbf{I}$ is the identity matrix, $\sigma^{2}_{C}$ is a constant standard deviation for all confounders, and $\phi$ is the covariance of any two confounders. Therefore, our random confounders C follow the distribution $$\mathbf{C}\sim N_{k}(\boldsymbol{\mu}_{C}, \Sigma_{C}).$$ We draw a total of n samples from this multivariate normal distribution using mvrnorm.

Generating Bivariate Exposure

The first step when generating the bivariate exposure is to specify the effects of the confounders C. We control this for each exposure value using the arguments d1_beta and d2_beta such that $$E[D_{1}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D1}\mathbf{C}$$ and $$E[D_{2}\mid \mathbf{C}]=\boldsymbol{\beta}^{T}_{D2}\mathbf{C}$$.

Note that by specifying d1_beta and d2_beta separately that the user can control the amount of overlap in the confounders for each exposure, and how many of the variables in C are truly related to the exposures. For instance to have the exposure have identical confounding effects d1_beta=d2_beta, and they have separate confounding if there are zero non-zero elements in common between d1_beta and d2_beta.

To generate the bivariate conditional distribution of exposures given the set of confounders C we have the following three methods:

"matrix_normal"
"uni_cond"
"vector_normal"

"matrix_normal" uses the function rmatnorm to generate all n samples as $$\mathbf{D}\mid\mathbf{C}\sim N_{n \times 2}(\boldsymbol{\beta}\mathbf{C}, \mathbf{I}_{n}, \Omega)$$ where $\boldsymbol{\beta}$ is a column vector containing $\boldsymbol{\beta}^{T}_{D1}$ and $\boldsymbol{\beta}^{T}_{D2}$, and $\Omega$ is the conditional covariance matrix.

"vector_normal" simply vectorizes the matrix_normal method above to generate a vector of length $n \times 2$.

"uni_cond" specifies the bivariate exposure using univariate conditional factorization, which in the case of bivariate normal results in two univariate normal expressions.

In general, we suggest using the univariate conditional, "uni_cond", method when generating exposures as it is substantially faster than both the matrix normal and vector normal approaches.

Note that the options use regular expression matching and can be specified uniquely using either "m", "u", or "v".

Marginal Covariance of Exposures

As described above the exposures are drawn conditional on the set C, so the marginal covariance of exposures is defined as $$\Sigma_{D}= \boldsymbol{\beta}\Sigma_{C}\boldsymbol{\beta}^{T}+\Omega.$$ In our function we return the true marginal covariance $\Sigma_{D}$ as well as the true marginal correlation $\rho_{D}$.

Examples

Run this code

# NOT RUN {
#generate bivariate exposures. D1 confounded by C1 and C2. D2 by C2 and C3
#uses univariate conditional normal to draw samples
sim_dt <- gen_D(method="u", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C

#observed correlation should be close to true marginal value
cor(D); sim_dt$rho


#Use vector normal method instead of univariate method to draw samples
sim_dt <- gen_D(method="v", n=200, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2, k=3,
C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0), d2_beta=c(0, 0.3, 0.75), seed=06112020)

# }

Run the code above in your browser using DataLab