FixedBinContIT: Fits (univariate) fixed-effect models to assess surrogacy in the case where the true endpoint is binary and the surrogate endpoint is continuous (based on the Information-Theoretic framework)

Description

The function FixedBinContIT uses the information-theoretic approach (Alonso & Molenberghs, 2007) to estimate trial- and individual-level surrogacy based on fixed-effect models when T is binary and S is continuous. The user can specify whether a (weighted or unweighted) full, semi-reduced, or reduced model should be fitted. See the Details section below.

Usage

FixedBinContIT(Dataset, Surr, True, Treat, Trial.ID, Pat.ID, 
Model=c("Full"), Weighted=TRUE, Min.Trial.Size=2, Alpha=.05, 
Number.Bootstraps=50,Seed=sample(1:1000, size=1))

Arguments

Dataset

A data.frame that should consist of one line per patient. Each line contains (at least) a surrogate value, a true endpoint value, a treatment indicator, a patient ID, and a trial ID.

Surr

The name of the variable in Dataset that contains the surrogate endpoint values.

True

The name of the variable in Dataset that contains the true endpoint values.

Treat

The name of the variable in Dataset that contains the treatment indicators. The treatment indicator should either be coded as $1$ for the experimental group and $-1$ for the control group, or as $1$ for the experimental group and $0$ for the control group.

Trial.ID

The name of the variable in Dataset that contains the trial ID to which the patient belongs.

Pat.ID

The name of the variable in Dataset that contains the patient's ID.

Model

The type of model that should be fitted, i.e., Model=c("Full"), Model=c("Reduced"), or Model=c("SemiReduced"). See the Details section below. Default Model=c("Full").

Weighted

Logical. In practice it is often the case that different trials (or other clustering units) have different sample sizes. Univariate models are used to assess surrogacy in the information-theoretic approach, so it can be useful to adjust for heterogeneity in information content between the trial-specific contributions (particularly when trial-level surrogacy measures are of primary interest and when the heterogeneity in sample sizes is large). If Weighted=TRUE, weighted regression models are fitted. If Weighted=FALSE, unweighted regression analyses are conducted. See the Details section below. Default TRUE.

Min.Trial.Size

The minimum number of patients that a trial should contain to be included in the analysis. If the number of patients in a trial is smaller than the value specified by Min.Trial.Size, the data of the trial are excluded from the analysis. Default $2$.

Alpha

The $\alpha$-level that is used to determine the confidence intervals around $R^2_{h}$ and $R^2_{ht}$. Default $0.05$.

Number.Bootstraps

The standard errors and confidence intervals for $R^2_{h}$ and $R^2_{h.ind}$ are determined based on a bootstrap procedure. Number.Bootstraps specifies the number of bootstrap samples that are used. Default $50$.

Seed

The seed to be used in the bootstrap procedure. Default $sample(1:1000, size=1)$.

Value

An object of class FixedBinContIT with components,

Data.Analyze

Prior to conducting the surrogacy analysis, data of patients who have a missing value for the surrogate and/or the true endpoint are excluded. In addition, the data of trials (i) in which only one type of the treatment was administered, and (ii) in which either the surrogate or the true endpoint was a constant (i.e., all patients within a trial had the same surrogate and/or true endpoint value) are excluded. In addition, the user can specify the minimum number of patients that a trial should contain in order to include the trial in the analysis. If the number of patients in a trial is smaller than the value specified by Min.Trial.Size, the data of the trial are excluded. Data.Analyze is the dataset on which the surrogacy analysis was conducted.

Obs.Per.Trial

A data.frame that contains the total number of patients per trial and the number of patients who were administered the control treatment and the experimental treatment in each of the trials (in Data.Analyze).

Trial.Spec.Results

A data.frame that contains the trial-specific intercepts and treatment effects for the surrogate and the true endpoints (when a full or semi-reduced model is requested), or the trial-specific treatment effects for the surrogate and the true endpoints (when a reduced model is requested).

R2ht

A data.frame that contains the trial-level surrogacy estimate and its confidence interval.

R2h.ind

A data.frame that contains the individual-level surrogacy estimate $R^2_{h.ind}$ (single-trial based estimate) and its confidence interval.

R2h

A data.frame that contains the individual-level surrogacy estimate $R^2_{h}$ (cluster-based estimate) and its confidence interval (bootstrap-based).

R2b.ind

A data.frame that contains the individual-level surrogacy estimate $R^2_{b.ind}$ (single-trial based estimate accounting for upper bound) and its confidence interval (based on a bootstrap).

R2h.Ind.By.Trial

A data.frame that contains individual-level surrogacy estimates $R^2_{h}$ (cluster-based estimate) and their confidence interval for each of the trials seperately.

Details

Individual-level surrogacy

The following univariate generalised linear models are fitted:

$$g_{T}(E(T_{ij}))=\mu_{Ti}+\beta_{i}Z_{ij},$$ $$g_{T}(E(T_{ij}|S_{ij}))=\gamma_{0i}+\gamma_{1i}Z_{ij}+\gamma_{2i}S_{ij},$$

where $i$ and $j$ are the trial and subject indicators, $g_{T}$ is an appropriate link function (i.e., a logit link for binary endpoints and an identity link for normally distributed continuous endpoints), $S_{ij}$ and $T_{ij}$ are the surrogate and true endpoint values of subject $j$ in trial $i$, and $Z_{ij}$ is the treatment indicator for subject $j$ in trial $i$. $\mu_{Ti}$ and $\beta_{i}$ are the trial-specific intercepts and treatment-effects on the true endpoint in trial $i$. $\gamma_{0i}$ and $\gamma_{1i}$ are the trial-specific intercepts and treatment-effects on the true endpoint in trial $i$ after accounting for the effect of the surrogate endpoint.

The $-2$ log likelihood values of the previous models in each of the $i$ trials (i.e., $L_{1i}$ and $L_{2i}$, respectively) are subsequently used to compute individual-level surrogacy based on the so-called Variance Reduction Factor (VFR; for details, see Alonso & Molenberghs, 2007):

$$R^2_{h}= 1 - \frac{1}{N} \sum_{i} exp \left(-\frac{L_{2i}-L_{1i}}{n_{i}} \right),$$

where $N$ is the number of trials and $n_{i}$ is the number of patients within trial $i$.

When it can be assumed (i) that the treatment-corrected association between the surrogate and the true endpoint is constant across trials, or (ii) when all data come from a single clinical trial (i.e., when $N=1$), the previous expression simplifies to:

$$R^2_{h.ind}= 1 - exp \left(-\frac{L_{2}-L_{1}}{N} \right).$$

The upper bound does not reach to 1 when $T$ is binary, i.e., its maximum is 0.75. Kent (1983) claims that 0.75 is a reasonable upper bound and thus $R^2_{h.ind}$ can usually be interpreted without paying special consideration to the discreteness of $T$. Alternatively, to address the upper bound problem, a scaled version of the mutual information can be used when both $S$ and $T$ are binary (Joe, 1989):

$$R^2_{b.ind}= \frac{I(T,S)}{min[H(T), H(S)]},$$

where the entropy of $T$ and $S$ in the previous expression can be estimated using the log likelihood functions of the GLMs shown above.

Trial-level surrogacy

When a full or semi-reduced model is requested (by using the argument Model=c("Full") or Model=c("SemiReduced") in the function call), trial-level surrogacy is assessed by fitting the following univariate models:

$$S_{ij}=\mu_{Si}+\alpha_{i}Z_{ij}+\varepsilon_{Sij}, (1)$$ $$T_{ij}=\mu_{Ti}+\beta_{i}Z_{ij}+\varepsilon_{Tij}, (1)$$

where $i$ and $j$ are the trial and subject indicators, $S_{ij}$ and $T_{ij}$ are the surrogate and true endpoint values of subject $j$ in trial $i$, $Z_{ij}$ is the treatment indicator for subject $j$ in trial $i$, $\mu_{Si}$ and $\mu_{Ti}$ are the fixed trial-specific intercepts for S and T, and $\alpha_{i}$ and $\beta_{i}$ are the fixed trial-specific treatment effects on S and T, respectively. The error terms $\varepsilon_{Sij}$ and $\varepsilon_{Tij}$ are assumed to be independent.

When a reduced model is requested by the user (by using the argument Model=c("Reduced") in the function call), the following univariate models are fitted:

$$S_{ij}=\mu_{S}+\alpha_{i}Z_{ij}+\varepsilon_{Sij}, (2)$$ $$T_{ij}=\mu_{T}+\beta_{i}Z_{ij}+\varepsilon_{Tij}, (2)$$

where $\mu_{S}$ and $\mu_{T}$ are the common intercepts for S and T. The other parameters are the same as defined above, and $\varepsilon_{Sij}$ and $\varepsilon_{Tij}$ are again assumed to be independent.

When the user requested a full model approach (by using the argument Model=c("Full") in the function call, i.e., when models (1) were fitted), the following model is subsequently fitted:

$$\widehat{\beta}_{i}=\lambda_{0}+\lambda_{1}\widehat{\mu_{Si}}+\lambda_{2}\widehat{\alpha}_{i}+\varepsilon_{i}, (3)$$

where the parameter estimates for $\beta_i$, $\mu_{Si}$, and $\alpha_i$ are based on models (1) (see above). When a weighted model is requested (using the argument Weighted=TRUE in the function call), model (3) is a weighted regression model (with weights based on the number of observations in trial $i$). The $-2$ log likelihood value of the (weighted or unweighted) model (3) ($L_1$) is subsequently compared to the $-2$ log likelihood value of an intercept-only model ($\widehat{\beta}_{i}=\lambda_{3}$; $L_0$), and $R^2_{ht}$ is computed based based on the Variance Reduction Factor (for details, see Alonso & Molenberghs, 2007):

$$R^2_{ht}= 1 - exp \left(-\frac{L_1-L_0}{N} \right),$$

where $N$ is the number of trials.

When a semi-reduced or reduced model is requested (by using the argument Model=c("SemiReduced") or Model=c("Reduced") in the function call), the following model is fitted:

$$\widehat{\beta}_{i}=\lambda_{0}+\lambda_{1}\widehat{\alpha}_{i}+\varepsilon_{i},$$

where the parameter estimates for $\beta_i$ and $\alpha_i$ are based on models (1) when a semi-reduced model is fitted or on models (2) when a reduced model is fitted. The $-2$ log likelihood value of this (weighted or unweighted) model ($L_1$) is subsequently compared to the $-2$ log likelihood value of an intercept-only model ($\widehat{\beta}_{i}=\lambda_{3}$; $L_0$), and $R^2_{ht}$ is computed based on the reduction in the likelihood (as described above).

References

Alonso, A, & Molenberghs, G. (2007). Surrogate marker evaluation from an information theory perspective. Biometrics, 63, 180-186.

Joe, H. (1989). Relative entropy measures of multivariate dependence. Journal of the American Statistical Association, 84, 157-164.

Kent, T. J. (1983). Information gain as a general measure of correlation. Biometrica, 70, 163-173.

Examples

Run this code

# NOT RUN {
 # Time consuming (>5sec) code part
# Generate data with continuous Surr and True
Sim.Data.MTS(N.Total=2000, N.Trial=100, R.Trial.Target=.8, 
R.Indiv.Target=.8, Seed=123, Model="Full")

# Make T binary
Data.Observed.MTS$True_Bin <- Data.Observed.MTS$True
Data.Observed.MTS$True_Bin[Data.Observed.MTS$True>=0] <- 1
Data.Observed.MTS$True_Bin[Data.Observed.MTS$True<0] <- 0

# Analyze data
Fit <- FixedBinContIT(Dataset = Data.Observed.MTS, Surr = Surr, 
True = True_Bin, Treat = Treat, Trial.ID = Trial.ID, Pat.ID = Pat.ID, 
Model = "Full", Number.Bootstraps=50)

# Examine results
summary(Fit)
plot(Fit, Trial.Level = FALSE, Indiv.Level.By.Trial=TRUE)
plot(Fit, Trial.Level = TRUE, Indiv.Level.By.Trial=FALSE)
# }

Run the code above in your browser using DataLab