ICA.ContCont.MultS_alt: Assess surrogacy in the causal-inference single-trial setting (Individual Causal Association, ICA) using a continuous univariate T and multiple continuous S, alternative approach

Description

The function ICA.ContCont.MultS_alt quantifies surrogacy in the single-trial causal-inference framework where T is continuous and there are multiple continuous S. This function provides an alternative for ICA.ContCont.MultS.

Usage

ICA.ContCont.MultS_alt(M = 500, N, Sigma, 
G = seq(from=-1, to=1, by = .00001),
Seed=c(123), Model = "Delta_T ~ Delta_S1 + Delta_S2", 
Show.Progress=FALSE)

Arguments

The number of multivariate ICA values ($R^2_{H}$) that should be sampled. Default M=500.

The sample size of the dataset.

Sigma

A matrix that specifies the variance-covariance matrix between $T_0$, $T_1$, $S_{10}$, $S_{11}$, $S_{20}$, $S_{21}$, ..., $S_{k0}$, and $S_{k1}$. The unidentifiable covariances should be defined as NA (see example below).

A vector of the values that should be considered for the unidentified correlations. Default G=seq(-1, 1, by=.00001), i.e., values with range $-1$ to $1$.

Seed

The seed that is used. Default Seed=123.

Model

The multivariate ICA ($R^2_{H}$) is essentially the coefficient of determination of a regression model in which $\Delta T$ is regressed on $\Delta S_1$, $\Delta S_2$, ... and so on. The Model= argument specifies the regression model to be used in the analysis. For example, for 2 surrogates, Model = "Delta_T ~ Delta_S1 + Delta_S2").

Show.Progress

Should progress of runs be graphically shown? (i.e., 1% done..., 2% done..., etc). Mainly useful when a large number of S have to be considered (to follow progress and estimate total run time).

Value

An object of class ICA.ContCont.MultS_alt with components,

R2_H

The multiple-surrogate individual causal association value(s).

Corr.R2_H

The corrected multiple-surrogate individual causal association value(s).

Res_Err_Delta_T

The residual errors (prediction errors) for intercept-only models of $\Delta T$ (i.e., models that do not include $\Delta S_1$, $\Delta S_2$, etc as predictors).

Res_Err_Delta_T_Given_S

The residual errors (prediction errors) for models where $\Delta T$ is regressed on $\Delta S_1$, $\Delta S_2$, etc.

Lower.Dig.Corrs.All

A data.frame that contains the matrix that contains the identifiable and unidentifiable correlations (lower diagonal elements) that were used to compute ($R^2_{H}$) in the run.

Details

The multivariate ICA ($R^2_{H}$) is not identifiable because the individual causal treatment effects on $T$, $S_1$, ..., $S_k$ cannot be observed. A simulation-based sensitivity analysis is therefore conducted in which the multivariate ICA ($R^2_{H}$) is estimated across a set of plausible values for the unidentifiable correlations. To this end, consider the variance covariance matrix of the potential outcomes $\boldsymbol{\Sigma}$ (0 and 1 subscripts refer to the control and experimental treatments, respectively): $$\boldsymbol{\Sigma} = \left(\begin{array}{ccccccccc} \sigma_{T_{0}T_{0}}\\ \sigma_{T_{0}T_{1}} & \sigma_{T_{1}T_{1}}\\ \sigma_{T_{0}S1_{0}} & \sigma_{T_{1}S1_{0}} & \sigma_{S1_{0}S1_{0}}\\ \sigma_{T_{0}S1_{1}} & \sigma_{T_{1}S1_{1}} & \sigma_{S1_{0}S1_{1}} & \sigma_{S1_{1}S1_{1}}\\ \sigma_{T_{0}S2_{0}} & \sigma_{T_{1}S2_{0}} & \sigma_{S1_{0}S2_{0}} & \sigma_{S1_{1}S2_{0}} & \sigma_{S2_{0}S2_{0}}\\ \sigma_{T_{0}S2_{1}} & \sigma_{T_{1}S2_{1}} & \sigma_{S1_{0}S2_{1}} & \sigma_{S1_{1}S2_{1}} & \sigma_{S2_{0}S2_{1}} & \sigma_{S2_{1}S2_{1}}\\ ... & ... & ... & ... & ... & ... & \ddots\\ \sigma_{T_{0}Sk_{0}} & \sigma_{T_{1}Sk_{0}} & \sigma_{S1_{0}Sk_{0}} & \sigma_{S1_{1}Sk_{0}} & \sigma_{S2_{0}Sk_{0}} & \sigma_{S2_{1}Sk_{0}} & ... & \sigma_{Sk_{0}Sk_{0}}\\ \sigma_{T_{0}Sk_{1}} & \sigma_{T_{1}Sk_{1}} & \sigma_{S1_{0}Sk_{1}} & \sigma_{S1_{1}Sk_{1}} & \sigma_{S2_{0}Sk_{1}} & \sigma_{S2_{1}Sk_{1}} & ... & \sigma_{Sk_{0}Sk_{1}} & \sigma_{Sk_{1}Sk_{1}}. \end{array}\right)$$

The ICA.ContCont.MultS_alt function requires the user to specify a distribution $G$ for the unidentified correlations. Next, the identifiable correlations are fixed at their estimated values and the unidentifiable correlations are independently and randomly sampled from $G$. In the function call, the unidentifiable correlations are marked by specifying NA in the Sigma matrix (see example section below). The algorithm generates a large number of 'completed' matrices, and only those that are positive definite are retained (the number of positive definite matrices that should be obtained is specified by the M= argument in the function call). Based on the identifiable variances, these positive definite correlation matrices are converted to covariance matrices $\boldsymbol{\Sigma}$ and the multiple-surrogate ICA are estimated.

An issue with this approach (i.e., substituting unidentified correlations by random and independent samples from $G$) is that the probability of obtaining a positive definite matrix is very low when the dimensionality of the matrix increases. One approach to increase the efficiency of the algorithm is to build-up the correlation matrix in a gradual way. In particular, the property that a $\left(k \times k\right)$ matrix is positive definite if and only if all principal minors are positive (i.e., Sylvester's criterion) can be used. In other words, a $\left(k \times k\right)$ matrix is positive definite when the determinants of the upper-left $\left(2 \times 2\right)$, $\left(3 \times 3\right)$, ..., $\left(k \times k\right)$ submatrices all have a positive determinant. Thus, when a positive definite $\left(k \times k\right)$ matrix has to be generated, one can start with the upper-left $\left(2 \times 2\right)$ submatrix and randomly sample a value from the unidentified correlation (here: $\rho_{T_0T_0}$) from $G$. When the determinant is positive (which will always be the case for a $\left(2 \times 2\right)$ matrix), the same procedure is used for the upper-left $\left(3 \times 3\right)$ submatrix, and so on. When a particular draw from $G$ for a particular submatrix does not give a positive determinant, new values are sampled for the unidentified correlations until a positive determinant is obtained. In this way, it can be guaranteed that the final $\left(k \times k\right)$ submatrix will be positive definite. The latter approach is used in the current function. This procedure is used to generate many positive definite matrices. These positive definite matrices are used to generate M datasets which contain $\Delta T$, $\Delta S_1$, $\Delta S_2$, ..., $\Delta S_k$. Finally, the multivariate ICA ($R^2_{H}$) is estimated by regressing $\Delta T$ on $\Delta S_1$, $\Delta S_2$, ..., $\Delta S_k$ and computing the multiple coefficient of determination.

References

Van der Elst, W., Alonso, A. A., & Molenberghs, G. (2017). Univariate versus multivariate surrogate endpoints.

Examples

Run this code

# NOT RUN {
 #time-consuming code parts
# Specify matrix Sigma (var-cavar matrix T_0, T_1, S1_0, S1_1, ...)
# here for 1 true endpoint and 3 surrogates

s<-matrix(rep(NA, times=64),8)
s[1,1] <- 450; s[2,2] <- 413.5; s[3,3] <- 174.2; s[4,4] <- 157.5; 
s[5,5] <- 244.0; s[6,6] <- 229.99; s[7,7] <- 294.2; s[8,8] <- 302.5
s[3,1] <- 160.8; s[5,1] <- 208.5; s[7,1] <- 268.4 
s[4,2] <- 124.6; s[6,2] <- 212.3; s[8,2] <- 287.1
s[5,3] <- 160.3; s[7,3] <- 142.8 
s[6,4] <- 134.3; s[8,4] <- 130.4 
s[7,5] <- 209.3; 
s[8,6] <- 214.7 
s[upper.tri(s)] = t(s)[upper.tri(s)]

# Marix looks like (NA indicates unidentified covariances):
#            T_0    T_1  S1_0  S1_1  S2_0   S2_1  S2_0  S2_1
#            [,1]  [,2]  [,3]  [,4]  [,5]   [,6]  [,7]  [,8]
# T_0  [1,] 450.0    NA 160.8    NA 208.5     NA 268.4    NA
# T_1  [2,]    NA 413.5    NA 124.6    NA 212.30    NA 287.1
# S1_0 [3,] 160.8    NA 174.2    NA 160.3     NA 142.8    NA
# S1_1 [4,]    NA 124.6    NA 157.5    NA 134.30    NA 130.4
# S2_0 [5,] 208.5    NA 160.3    NA 244.0     NA 209.3    NA
# S2_1 [6,]    NA 212.3    NA 134.3    NA 229.99    NA 214.7
# S3_0 [7,] 268.4    NA 142.8    NA 209.3     NA 294.2    NA
# S3_1 [8,]    NA 287.1    NA 130.4    NA 214.70    NA 302.5

# Conduct analysis
ICA <- ICA.ContCont.MultS_alt(M=100, N=200, Show.Progress = TRUE,
  Sigma=s, G = seq(from=-1, to=1, by = .00001), Seed=c(123), 
  Model = "Delta_T ~ Delta_S1 + Delta_S2 + Delta_S3")

# Explore results
summary(ICA)
plot(ICA)
# }

Run the code above in your browser using DataLab