getVar.Res: Analysis of the effect of each term of a linear regression model by analyzing its residuals

Description

This function provides an analysis of the effect of each model term by comparing the residuals of the Full model and the model without each term. The model is fitted using the train data set, but analysis of residual improvement is done on the train and test data sets. Residuals are compared by a paired t-test, a paired Wilcoxon rank-sum test, a binomial sign test and the F-test on residual variance. Additionally, the net residual improvement (NeRI) of each model term is reported.

Usage

getVar.Res(object,
	           data,
	           Outcome = "Class",
	           type = c("LM", "LOGIT", "COX"),
	           testData = NULL,
	           callCpp=TRUE)

Arguments

object

An object of class lm, glm, or coxph containing the model to be analyzed

data

A data frame where all variables are stored in different columns

Outcome

The name of the column in data that stores the variable to be predicted by the model

type

Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")

testData

A data frame similar to data, but with a data set to be independently tested. If NULL, data will be used.

callCpp

is set to true it will use the c++ implementation of residual improvement.

Value

tP.value: A vector in which each element represents the single sided p-value of the paired t-test comparing the absolute values of the residuals obtained with the Full model and the model without one term
BinP.value: A vector in which each element represents the p-value associated with a significant improvement in residuals according to the binomial sign test
WilcoxP.value: A vector in which each element represents the single sided p-value of the Wilcoxon rank-sum test comparing the absolute values of the residuals obtained with the Full model and the model without one term
FP.value: A vector in which each element represents the single sided p-value of the F-test comparing the residual variances of the residuals obtained with the Full model and the model without one term
NeRIs: A vector in which each element represents the net residual improvement between the Full model and the model without one term
testData.tP.value: A vector similar to tP.value, where values were estimated in testdata
testData.BinP.value: A vector similar to BinP.value, where values were estimated in testdata
testData.WilcoxP.value: A vector similar to WilcoxP.value, where values were estimated in testdata
testData.FP.value: A vector similar to FP.value, where values were estimated in testdata
testData.NeRIs: A vector similar to NeRIs, where values were estimated in testdata
unitestMSS: A vector with the univariate residual mean sum of squares of each model variable on the test data
unitrainMSS: A vector with the univariate residual mean sum of squares of each model variable on the train data

Examples

Run this code

	## Not run: 
# 	# Start the graphics device driver to save all plots in a pdf format
# 	pdf(file = "Example.pdf")
# 	# Get the stage C prostate cancer data from the rpart package
# 	library(rpart)
# 	data(stagec)
# 	# Split the stages into several columns
# 	dataCancer <- cbind(stagec[,c(1:3,5:6)],
# 	                    gleason4 = 1*(stagec[,7] == 4),
# 	                    gleason5 = 1*(stagec[,7] == 5),
# 	                    gleason6 = 1*(stagec[,7] == 6),
# 	                    gleason7 = 1*(stagec[,7] == 7),
# 	                    gleason8 = 1*(stagec[,7] == 8),
# 	                    gleason910 = 1*(stagec[,7] >= 9),
# 	                    eet = 1*(stagec[,4] == 2),
# 	                    diploid = 1*(stagec[,8] == "diploid"),
# 	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
# 	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
# 	# Remove the incomplete cases
# 	dataCancer <- dataCancer[complete.cases(dataCancer),]
# 	# Load a pre-stablished data frame with the names and descriptions of all variables
# 	data(cancerVarNames)
# 	# Split the data set into train and test samples
# 	trainDataCancer <- dataCancer[1:(nrow(dataCancer)/2),]
# 	testDataCancer <- dataCancer[(nrow(dataCancer)/2+1):nrow(dataCancer),]
# 	# Get a Cox proportional hazards model using:
# 	# - 10 bootstrap loops
# 	# - Train data
# 	# - Age as a covariate
# 	# - The Wilcoxon rank-sum test as the feature inclusion criterion
# 	cancerModel <- ForwardSelection.Model.Res(loops = 10,
# 	                                    covariates = "1 + age",
# 	                                    Outcome = "pgstat",
# 	                                    variableList = cancerVarNames,
# 	                                    data = trainDataCancer,
# 	                                    type = "COX",
# 	                                    testType= "Wilcox",
# 	                                    timeOutcome = "pgtime")
# 	# Get the NeRI of each model term in the train data set and in the independent data set
# 	cancerModelNeRI <- getVar.Res(object = cancerModel$final.model,
# 	                              data = testDataCancer,
# 	                              Outcome = "pgstat",
# 	                              type = "COX")
# 	# Shut down the graphics device driver
# 	dev.off()## End(Not run)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples