BBC_dichotom: Bootstrap-based Optimism Correction for Dichotomization

Description

Multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.

Usage

BBC_dichotom(formula, data, ...)
optimism_dichotom(fom, X, data, R = 100L, ...)
coef_dichotom(fom, X., data)

Value

Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,

attr(,'optimism'): the returned object from optimism_dichotom
attr(,'apparent_cutoff'): a double vector, cutoff thresholds for the \(k\) predictors in the apparent model

Arguments

formula: formula, e.g., y~z~x or y~1~x. Response \(y\) may be double, logical and Surv. Predictors \(x\)'s to be dichotomized may be one or more numeric vectors and/or one matrix. Additional predictors \(z\)'s, if any, may be of any type.
data: data.frame
...: additional parameters, currently not in use
fom: formula, e.g., y~z or y~1, for helper functions, with the response \(y\) and additional predictors \(z\)'s, if any
X: numeric matrix of \(k\) columns, numeric predictors \(x_1,\cdots,x_k\) to be dichotomized
R: positive integer scalar, number of bootstrap replicates \(R\), default 100L
X.: logical matrix \(\tilde{X}\) of \(k\) columns, dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\)

Details on Helper Functions

Bootstrap-Based Optimism

Helper function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. Specifically,

\(R\) copies of bootstrap samples are generated. In the \(j\)-th bootstrap sample,
1. obtain the dichotomizing rules \(\mathbf{\mathcal{D}}^{(j)}\) of predictors \(x_1^{(j)},\cdots,x_k^{(j)}\) based on response \(y^{(j)}\) (via m_rpartD)
2. multivariable regression (with additional predictors \(z^{(j)}\), if any) coefficient estimates \(\mathbf{\hat{\beta}}^{(j)} = \left(\hat{\beta}_1^{(j)},\cdots,\hat{\beta}_k^{(j)}\right)^t\) of the dichotomized predictors \(\left(\tilde{x}_1^{(j)},\cdots,\tilde{x}_k^{(j)}\right) = \mathcal{D}^{(j)}\left(x_1^{(j)},\cdots,x_k^{(j)}\right)\) (via coef_dichotom) are the bootstrap performance estimate.
Dichotomize \(x_1,\cdots,x_k\) in the entire data using each of the bootstrap rules \(\mathcal{D}^{(1)},\cdots,\mathcal{D}^{(R)}\). Multivariable regression (with additional predictors \(z\), if any) coefficient estimates \(\mathbf{\hat{\beta}}^{[j]} = \left(\hat{\beta}_1^{[j]},\cdots,\hat{\beta}_k^{[j]}\right)^t\) of the dichotomized predictors \(\left(\tilde{x}_1^{[j]},\cdots,\tilde{x}_k^{[j]}\right) = \mathcal{D}^{(j)}\left(x_1,\cdots,x_k\right)\) (via coef_dichotom) are the test performance estimate.
Difference between the bootstrap and test performance estimates, an \(R\times k\) matrix of \(\left(\mathbf{\hat{\beta}}^{(1)},\cdots,\mathbf{\hat{\beta}}^{(R)}\right)\) minus another \(R\times k\) matrix of \(\left(\mathbf{\hat{\beta}}^{[1]},\cdots,\mathbf{\hat{\beta}}^{[R]}\right)\), are the bootstrap-based optimism.

Multivariable Regression Coefficient Estimates of Dichotomized Predictors \(\tilde{x}\)'s

Helper function coef_dichotom fits a multivariable Cox proportional hazards (coxph) model for Surv response, logistic (glm) regression model for logical response, or linear (lm) regression model for gaussian response, with the dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\) as well as the additional predictors \(z\)'s.

It is almost inevitable to have duplicates among the dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\). In such case, the multivariable model is fitted using the unique \(\tilde{x}\)'s.

Returns of Helper Functions

Of helper function optimism_dichotom

Helper function optimism_dichotom returns an \(R\times k\) double matrix of bootstrap-based optimism, with attributes

attr(,'cutoff'): an \(R\times k\) double matrix, the \(R\) copies of bootstrap cutoff thresholds for the \(k\) predictors. See attribute 'cutoff' of function m_rpartD

Of helper function coef_dichotom

Helper function coef_dichotom returns a double vector of the regression coefficients of dichotomized predictors \(\tilde{x}\)'s, with attributes

attr(,'model'): the coxph, glm or lm regression model

In the case of duplicated \(\tilde{x}\)'s, the regression coefficients of the unique \(\tilde{x}\)'s are duplicated for those duplicates in \(\tilde{x}\)'s.

Details

Function BBC_dichotom obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,

Obtain the dichotomizing rules \(\mathbf{\mathcal{D}}\) of predictors \(x_1,\cdots,x_k\) based on response \(y\) (via m_rpartD). Multivariable regression (with additional predictors \(z\), if any) with dichotomized predictors \(\left(\tilde{x}_1,\cdots,\tilde{x}_k\right) = \mathcal{D}\left(x_1,\cdots,x_k\right)\) (via helper function coef_dichotom) is the apparent performance.
Obtain the bootstrap-based optimism based on \(R\) copies of bootstrap samples (via helper function optimism_dichotom). The median of bootstrap-based optimism over \(R\) bootstrap copies is the optimism-correction of the dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\).
Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1), only for \(\tilde{x}_1,\cdots,\tilde{x}_k\). The apparent performance estimates for additional predictors \(z\)'s, if any, are not modified. Neither the variance-covariance (vcov) estimates nor the other regression diagnostics, e.g., residuals, logLikelihood, etc., of the apparent performance are modified for now. This coefficient-only, partially-modified regression model is the optimism-corrected performance.

References

For helper function optimism_dichotom

Ewout W. Steyerberg (2009) Clinical Prediction Models. tools:::Rd_expr_doi("10.1007/978-0-387-77244-8")

Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. tools:::Rd_expr_doi("10.1002/(SICI)1097-0258(19960229)15:4<361::aid-sim168>3.0.CO;2-4")

Examples

Run this code

library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
  mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))

m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus ~ kappa + lambda, 
 data = flchain_Circulatory)
summary(m1)
attr(attr(m1, 'optimism'), 'cutoff')
attr(m1, 'apparent_cutoff')

Run the code above in your browser using DataLab