Learn R Programming

ClinicalUtilityRecal (version 0.1.0)

cvRepWtTuning: Repeated Cross Validation for Weight Tuning Parameter Selection

Description

Calibration weights require specification of tuning parameter \(delta\) or \(lambda\). Since a single round of cross-validation can be noisy, cross-validation can be repeated multiple times with independent random partitions and the results be averaged. This function implements a repeated K-fold cross-validation where tuning parameter \(labmda\) or \(delta\) is selected by maximizing standardized net benefit (sNB) (i.e. repeated cvWtTuning procedure).

A a "one-standard error" rule can be used for selecting tuning parameters. Under the <U+201C>one-standard error" rule the calibration weight tuning parameter (\(lambda\) or \(delta\)) is selected such that corresponding cross-validated sNB is within one-standard deviation of the maximum cross-validated sNB. This provides protection against overfitting the data and selecting a tuning parameter that is too extreme. If the "one-standard error" rule is not implemented, then the tuning parameter with the larged average cross-validted sNB (across folds and repetition) will be selected.

Usage

cvRepWtTuning(y,p,r,rl,ru,kFold=5,cvRep=25,cvParm,tuneSeq,stdErrRule=TRUE,int.seed=11111)

Arguments

y

Vector of binary outcomes, with 1 indicating event (cases) and 0 indicating no event (controls)

p

Vector of risk score values

r

Clinically relevant risk threshold

rl

Lower bound of clinically relevant region

ru

Upper bound of clinically relevant region

kFold

Number of folds for cross-validation

cvRep

Number of cross-validation repititions

cvParm

Parameter to be selected via cross-validation. Can be either \(delta\) the weight assigned to observations outside the clinically relevant region [R_l,R_u], or the \(lambda\) tuning parameter controlling exponential decay within the clinically relevant region [R_l,R_u]

tuneSeq

Sequence of values of tuning parameters to perform cross-validation over

stdErrRule

Use "one-standard" error rule selecting tuning parameter

int.seed

Intial seed set for random splitting of data into K folds

Value

cv.sNB

Standardized net benefit (sNB) of tuning parameter selected via cross-validatoin

cv.RAW

Corresponding RAW value given cross-valiated selected tuning parameter

cv.lambda

\(lambda\) value selected via cross-validation if \(cvParm=lambda\), otherwise user specified \(lambda\) value

cv.delta

\(delta\) value selected via cross-validation if \(cvParm=delta\), otherwise user specified \(lambda\) value

avgCV.res

Averaged (across-replications) cross-validated sNB for sequence of tuning parameters

W

Estimate of "with-in" repetition variance. Will only return if stdErrRule==TRUE

B

Estimate of "between" repetition variance. Will only return if stdErrRule==TRUE

fullList

List of cross-valiation results for all fold and repititions

Details

To estimate the standard deviation of the cross-validated sNV, the dependence between the different partitions of cross-validation needs to be accounted for. Gelman (1992) give a variance estimator of convergence diagnostic statistic used when Markov Chain Monte Carlo with multiple chains are performed. The variance estimator accounts for both the variability of the statistic <U+201C>within" a single chain, and the variance of the statistic across, or <U+201C>between", chains. Analogously, we can use this framework to estimate the <U+201C>within" repetition variance (i.e. variation in sNB from a single round of K-fold cross-validation) and the <U+201C>between" repetition variance. We denote the <U+2018>within" repetition variance as W and the <U+201C>between" repetition variance as B . We augment this formula slightly from that given in Gelman (1992) to account for the fact that as the number of cross-validation repetitions increases, the between-repetition variability should decrease. See Mishra et al (2020) for full expressions of B and W.

References

Mishra, A. (2019). Methods for Risk Markers that Incorporate Clinical Utility (Doctoral dissertation). (Available Upon Request)

Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.

See Also

calWt, RAWgrid, nb, cvWtTuning

Examples

Run this code
# NOT RUN {
### Load data ##
# }
# NOT RUN {
data(fakeData)

### Get grid of tuning parameters  ###
grid <- RAWgrid(r = 0.3,rl = -Inf,ru = Inf,p = fakeData$p,y = fakeData$y,
                cvParm = "lambda",rl.raw = 0.25,ru.raw = 0.35)

### Implement repeated k-fold cross validation
repCV <- cvRepWtTuning(y = fakeData$y,p = fakeData$p,rl = -Inf,ru = Inf,r = 0.3,
                       kFold = 5,cvRep = 25,cvParm = "lambda",tuneSeq = grid,stdErrRule = TRUE)

## cross-validation results
repCV$avgCV.res

## cross-validation selected lambda, RAW, and sNV
cv.lambda <- repCV$cv.lambda
cv.RAW <- repCV$cv.RAW
cv.RAW <- repCV$cv.sNB
# }

Run the code above in your browser using DataLab