bootstrapVarElimination_Bin: IDI/NRI-based backwards variable elimination with bootstrapping

Description

This function removes model terms that do not improve the bootstrapped integrated discrimination improvement (IDI) or net reclassification improvement (NRI) significantly.

Usage

bootstrapVarElimination_Bin(object,
	                        pvalue = 0.05,
	                        Outcome = "Class",
	                        data,
	                        startOffset = 0, 
	                        type = c("LOGIT", "LM", "COX"),
	                        selectionType = c("zIDI", "zNRI"),
	                        loops = 250,
	                        fraction = 1.0,
	                        print=TRUE,
	                        plots=TRUE,
							adjsize=1)

Arguments

object

An object of class lm, glm, or coxph containing the model to be analyzed

pvalue

The maximum p-value, associated to either IDI or NRI, allowed for a term in the model

Outcome

The name of the column in data that stores the variable to be predicted by the model

data

A data frame where all variables are stored in different columns

startOffset

Only terms whose position in the model is larger than the startOffset are candidates to be removed

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

selectionType

The type of index to be evaluated by the improveProb function (Hmisc package): z-score of IDI or of NRI

loops

The number of bootstrap loops

fraction

The fraction of data (sampled with replacement) to be used as train

Logical. If TRUE, information will be displayed

plots

Logical. If TRUE, plots are displayed

adjsize

the number of features to be used in the BH FDR correction

Value

back.model: An object of the same class as object containing the reduced model
loops: The number of loops it took for the model to stabilize
reclas.info: A list with the NRI and IDI statistics of the reduced model, as given by the getVar.Bin function
bootCV: An object of class bootstrapValidation_Bin containing the results of the bootstrap validation in the reduced model
back.formula: An object of class formula with the formula used to fit the reduced model
lastRemoved: The name of the last term that was removed (-1 if all terms were removed)
beforeFSC.model: the beforeFSC model will have the model with the minimum bootstrap test error
beforeFSC.formula: the string formula of the model used to find the minimum bootstrap test error

Details

For each model term $x_i$, the IDI or NRI is computed for the Full model and the reduced model( where the term $x_i$ removed). The term whose removal results in the smallest drop in bootstrapped improvement is selected. The hypothesis: the term adds classification improvement is tested by checking the pvalue of average improvement. If $p(IDI or NRI)>pvalue$, then the term is removed. In other words, only model terms that significantly aid in subject classification are kept. The procedure is repeated until no term fulfils the removal criterion.

References

Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in medicine 27(2), 157-172.

Examples

Run this code

	## Not run: 
# 	# Start the graphics device driver to save all plots in a pdf format
# 	pdf(file = "Example.pdf")
# 	# Get the stage C prostate cancer data from the rpart package
# 	library(rpart)
# 	data(stagec)
# 	# Split the stages into several columns
# 	dataCancer <- cbind(stagec[,c(1:3,5:6)],
# 	                    gleason4 = 1*(stagec[,7] == 4),
# 	                    gleason5 = 1*(stagec[,7] == 5),
# 	                    gleason6 = 1*(stagec[,7] == 6),
# 	                    gleason7 = 1*(stagec[,7] == 7),
# 	                    gleason8 = 1*(stagec[,7] == 8),
# 	                    gleason910 = 1*(stagec[,7] >= 9),
# 	                    eet = 1*(stagec[,4] == 2),
# 	                    diploid = 1*(stagec[,8] == "diploid"),
# 	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
# 	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
# 	# Remove the incomplete cases
# 	dataCancer <- dataCancer[complete.cases(dataCancer),]
# 	# Load a pre-stablished data frame with the names and descriptions of all variables
# 	data(cancerVarNames)
# 	# Get a Cox proportional hazards model using:
# 	# - A lax p-value
# 	# - 10 bootstrap loops
# 	# - Age as a covariate
# 	# - zIDI as the feature inclusion criterion
# 	# - First order interactions
# 	cancerModel <- ForwardSelection.Model.Bin(pvalue = 0.1,
# 	                                           loops = 10,
# 	                                           covariates = "1 + age",
# 	                                           Outcome = "pgstat",
# 	                                           variableList = cancerVarNames,
# 	                                           data = dataCancer,
# 	                                           type = "COX",
# 	                                           timeOutcome = "pgtime",
# 	                                           selectionType = "zIDI",
# 	                                           interaction = 2)
# 	# Remove not significant variables from the previous model:
# 	# - Using a strict p-value
# 	# - Excluding the covariate as a candidate for feature removal
# 	# - Using zIDI as the feature removal criterion
# 	# - Using 50 bootstrap loops
# 	reducedCancerModel <- bootstrapVarElimination_Bin(object = cancerModel$final.model,
# 	                                              pvalue = 0.005,
# 	                                              Outcome = "pgstat",
# 	                                              data = dataCancer,
# 	                                              startOffset = 1,
# 	                                              type = "COX",
# 	                                              selectionType = "zIDI",
# 	                                              loops = 50)
# 	# Shut down the graphics device driver
# 	dev.off()## End(Not run)

Run the code above in your browser using DataLab