bootstrapValidation_Res: Bootstrap validation of regression models

Description

This function bootstraps the model n times to estimate for each variable the empirical bootstrapped distribution of model coefficients, and net residual improvement (NeRI). At each bootstrap the non-observed data is predicted by the trained model, and statistics of the test prediction are stores and reported.

Usage

bootstrapValidation_Res(fraction = 1,
	                        loops = 200,
	                        model.formula,
	                        Outcome,
	                        data,
	                        type = c("LM", "LOGIT", "COX"),
	                        plots = TRUE)

Arguments

fraction

The fraction of data (sampled with replacement) to be used as train

loops

The number of bootstrap loops

model.formula

An object of class formula with the formula to be used

Outcome

The name of the column in data that stores the variable to be predicted by the model

data

A data frame where all variables are stored in different columns

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

plots

Logical. If TRUE, density distribution plots are displayed

Value

data: The data frame used to bootstrap and validate the model
outcome: A vector with the predictions made by the model
boot.model: An object of class lm, glm, or coxph containing a model whose coefficients are the median of the coefficients of the bootstrapped models
NeRIs: A matrix with the NeRI for each model term, estimated using the bootstrap test sets
tStudent.pvalues: A matrix with the t-test p-value of the NeRI for each model term, estimated using the bootstrap train sets
wilcox.pvalues: A matrix with the Wilcoxon rank-sum test p-value of the NeRI for each model term, estimated using the bootstrap train sets
bin.pvlaues: A matrix with the binomial test p-value of the NeRI for each model term, estimated using the bootstrap train sets
F.pvlaues: A matrix with the F-test p-value of the NeRI for each model term, estimated using the bootstrap train sets
test.tStudent.pvalues: A matrix with the t-test p-value of the NeRI for each model term, estimated using the bootstrap test sets
test.wilcox.pvalues: A matrix with the Wilcoxon rank-sum test p-value of the NeRI for each model term, estimated using the bootstrap test sets
test.bin.pvlaues: A matrix with the binomial test p-value of the NeRI for each model term, estimated using the bootstrap test sets
test.F.pvlaues: A matrix with the F-test p-value of the NeRI for each model term, estimated using the bootstrap test sets
testPrediction: A vector that contains all the individual predictions used to validate the model in the bootstrap test sets
testOutcome: A vector that contains all the individual outcomes used to validate the model in the bootstrap test sets
testResiduals: A vector that contains all the residuals used to validate the model in the bootstrap test sets
trainPrediction: A vector that contains all the individual predictions used to validate the model in the bootstrap train sets
trainOutcome: A vector that contains all the individual outcomes used to validate the model in the bootstrap train sets
trainResiduals: A vector that contains all the residuals used to validate the model in the bootstrap train sets
testRMSE: The global RMSE, estimated using the bootstrap test sets
trainRMSE: The global RMSE, estimated using the bootstrap train sets
trainSampleRMSE: A vector with the RMSEs in the bootstrap train sets
testSampledRMSE: A vector with the RMSEs in the bootstrap test sets

Details

The bootstrap validation will estimate the confidence interval of the model coefficients and the NeRI. It will also compute the train and blind test root-mean-square error (RMSE), as well as the distribution of the NeRI p-values.

Examples

Run this code

	## Not run: 
# 	# Start the graphics device driver to save all plots in a pdf format
# 	pdf(file = "Example.pdf")
# 	# Get the stage C prostate cancer data from the rpart package
# 	library(rpart)
# 	data(stagec)
# 	# Split the stages into several columns
# 	dataCancer <- cbind(stagec[,c(1:3,5:6)],
# 	                    gleason4 = 1*(stagec[,7] == 4),
# 	                    gleason5 = 1*(stagec[,7] == 5),
# 	                    gleason6 = 1*(stagec[,7] == 6),
# 	                    gleason7 = 1*(stagec[,7] == 7),
# 	                    gleason8 = 1*(stagec[,7] == 8),
# 	                    gleason910 = 1*(stagec[,7] >= 9),
# 	                    eet = 1*(stagec[,4] == 2),
# 	                    diploid = 1*(stagec[,8] == "diploid"),
# 	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
# 	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
# 	# Remove the incomplete cases
# 	dataCancer <- dataCancer[complete.cases(dataCancer),]
# 	# Load a pre-stablished data frame with the names and descriptions of all variables
# 	data(cancerVarNames)
# 	# Get a Cox proportional hazards model using:
# 	# - 10 bootstrap loops
# 	# - Age as a covariate
# 	# - The Wilcoxon rank-sum test as the feature inclusion criterion
# 	cancerModel <- ForwardSelection.Model.Res(loops = 10,
# 	                                    covariates = "1 + age",
# 	                                    Outcome = "pgstat",
# 	                                    variableList = cancerVarNames,
# 	                                    data = dataCancer,
# 	                                    type = "COX",
# 	                                    testType= "Wilcox",
# 	                                    timeOutcome = "pgtime")
# 	# Validate the previous model:
# 	# - Using 50 bootstrap loops
# 	bootCancerModel <- bootstrapValidation_Res(loops = 50,
# 	                                           model.formula = cancerModel$formula,
# 	                                           Outcome = "pgstat",
# 	                                           data = dataCancer,
# 	                                           type = "COX")
# 	# Shut down the graphics device driver
# 	dev.off()## End(Not run)