Multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.
BBC_dichotom(formula, data, ...)optimism_dichotom(fom, X, data, R = 100L, ...)
coef_dichotom(fom, X., data)
Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,
formula, e.g., y~z~x
or y~1~x
.
Response \(y\) may be double, logical and Surv.
Predictors \(x\)'s to be dichotomized may be one or more numeric vectors and/or one matrix.
Additional predictors \(z\)'s, if any, may be of any type.
additional parameters, currently not in use
formula, e.g., y~z
or y~1
, for helper functions, with the response \(y\) and additional predictors \(z\)'s, if any
numeric matrix of \(k\) columns, numeric predictors \(x_1,\cdots,x_k\) to be dichotomized
positive integer scalar,
number of bootstrap replicates \(R\), default 100L
logical matrix \(\tilde{X}\) of \(k\) columns, dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\)
Helper function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. Specifically,
\(R\) copies of bootstrap samples are generated. In the \(j\)-th bootstrap sample,
obtain the dichotomizing rules \(\mathbf{\mathcal{D}}^{(j)}\) of predictors \(x_1^{(j)},\cdots,x_k^{(j)}\) based on response \(y^{(j)}\) (via m_rpartD)
multivariable regression (with additional predictors \(z^{(j)}\), if any) coefficient estimates \(\mathbf{\hat{\beta}}^{(j)} = \left(\hat{\beta}_1^{(j)},\cdots,\hat{\beta}_k^{(j)}\right)^t\) of the dichotomized predictors \(\left(\tilde{x}_1^{(j)},\cdots,\tilde{x}_k^{(j)}\right) = \mathcal{D}^{(j)}\left(x_1^{(j)},\cdots,x_k^{(j)}\right)\) (via coef_dichotom) are the bootstrap performance estimate.
Dichotomize \(x_1,\cdots,x_k\) in the entire data using each of the bootstrap rules \(\mathcal{D}^{(1)},\cdots,\mathcal{D}^{(R)}\). Multivariable regression (with additional predictors \(z\), if any) coefficient estimates \(\mathbf{\hat{\beta}}^{[j]} = \left(\hat{\beta}_1^{[j]},\cdots,\hat{\beta}_k^{[j]}\right)^t\) of the dichotomized predictors \(\left(\tilde{x}_1^{[j]},\cdots,\tilde{x}_k^{[j]}\right) = \mathcal{D}^{(j)}\left(x_1,\cdots,x_k\right)\) (via coef_dichotom) are the test performance estimate.
Difference between the bootstrap and test performance estimates, an \(R\times k\) matrix of \(\left(\mathbf{\hat{\beta}}^{(1)},\cdots,\mathbf{\hat{\beta}}^{(R)}\right)\) minus another \(R\times k\) matrix of \(\left(\mathbf{\hat{\beta}}^{[1]},\cdots,\mathbf{\hat{\beta}}^{[R]}\right)\), are the bootstrap-based optimism.
Helper function coef_dichotom fits a multivariable Cox proportional hazards (coxph) model for Surv response, logistic (glm) regression model for logical response, or linear (lm) regression model for gaussian response, with the dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\) as well as the additional predictors \(z\)'s.
It is almost inevitable to have duplicates among the dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\). In such case, the multivariable model is fitted using the unique \(\tilde{x}\)'s.
Helper function optimism_dichotom returns an \(R\times k\) double matrix of bootstrap-based optimism, with attributes
attr(,'cutoff')
an \(R\times k\) double matrix,
the \(R\) copies of bootstrap cutoff thresholds for the \(k\) predictors.
See attribute 'cutoff'
of function m_rpartD
Helper function coef_dichotom returns a double vector of the regression coefficients of dichotomized predictors \(\tilde{x}\)'s, with attributes
In the case of duplicated \(\tilde{x}\)'s, the regression coefficients of the unique \(\tilde{x}\)'s are duplicated for those duplicates in \(\tilde{x}\)'s.
Function BBC_dichotom obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,
Obtain the dichotomizing rules \(\mathbf{\mathcal{D}}\) of predictors \(x_1,\cdots,x_k\) based on response \(y\) (via m_rpartD). Multivariable regression (with additional predictors \(z\), if any) with dichotomized predictors \(\left(\tilde{x}_1,\cdots,\tilde{x}_k\right) = \mathcal{D}\left(x_1,\cdots,x_k\right)\) (via helper function coef_dichotom) is the apparent performance.
Obtain the bootstrap-based optimism based on \(R\) copies of bootstrap samples (via helper function optimism_dichotom). The median of bootstrap-based optimism over \(R\) bootstrap copies is the optimism-correction of the dichotomized predictors \(\tilde{x}_1,\cdots,\tilde{x}_k\).
Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1), only for \(\tilde{x}_1,\cdots,\tilde{x}_k\). The apparent performance estimates for additional predictors \(z\)'s, if any, are not modified. Neither the variance-covariance (vcov) estimates nor the other regression diagnostics, e.g., residuals, logLikelihood, etc., of the apparent performance are modified for now. This coefficient-only, partially-modified regression model is the optimism-corrected performance.
Ewout W. Steyerberg (2009) Clinical Prediction Models. tools:::Rd_expr_doi("10.1007/978-0-387-77244-8")
Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. tools:::Rd_expr_doi("10.1002/(SICI)1097-0258(19960229)15:4<361::aid-sim168>3.0.CO;2-4")361::aid-sim168>
library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))
m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus ~ kappa + lambda,
data = flchain_Circulatory)
summary(m1)
attr(attr(m1, 'optimism'), 'cutoff')
attr(m1, 'apparent_cutoff')
Run the code above in your browser using DataLab