Learn R Programming

FRESA.CAD (version 2.2.0)

uniRankVar: Univariate analysis of features (additional values returned)

Description

This function reports the mean and standard deviation for each feature in a model, and ranks them according to a user-specified score. Additionally, it does a Kolmogorov-Smirnov (KS) test on the raw and z-standardized data. It also reports the raw and z-standardized t-test score, the p-value of the Wilcoxon rank-sum test, the integrated discrimination improvement (IDI), the net reclassification improvement (NRI), the net residual improvement (NeRI), and the area under the ROC curve (AUC). Furthermore, it reports the z-value of the variable significance on the fitted model. Besides reporting an ordered data frame, this function returns all arguments as values, so that the results can be updates with the update.uniRankVar if needed.

Usage

uniRankVar(variableList, formula, Outcome, data, categorizationType = c("Raw", "Categorical", "ZCategorical", "RawZCategorical", "RawTail", "RawZTail"), type = c("LOGIT", "LM", "COX"), rankingTest = c("zIDI", "zNRI", "IDI", "NRI", "NeRI", "Ztest", "AUC", "CStat", "Kendall"), cateGroups = c(0.1, 0.9), raw.dataFrame = NULL, description = ".", uniType = c("Binary", "Regression"), FullAnalysis=TRUE)

Arguments

variableList
A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables
formula
An object of class formula with the formula to be fitted
Outcome
The name of the column in data that stores an optional binary outcome that may be used to show the stratified analysis
data
A data frame where all variables are stored in different columns
categorizationType
How variables will be analysed : As given in data ("Raw"); broken into the p-value categories given by cateGroups ("Categorical"); broken into the p-value categories given by cateGroups, and weighted by the z-score ("ZCategorical"); broken into the p-value categories given by cateGroups, weighted by the z-score, plus the raw values ("RawZCategorical"); raw values, plus the tails ("RawTail"); or raw values, wighted by the z-score, plus the tails ("RawZTail")
type
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")
rankingTest
Variables will be ranked based on: The z-score of the IDI ("zIDI"), the z-score of the NRI ("zNRI"), the IDI ("IDI"), the NRI ("NRI"), the NeRI ("NeRI"), the z-score of the model fit ("Ztest"), the AUC ("AUC"), the Somers' rank correlation ("Cstat"), or the Kendall rank correlation ("Kendall")
cateGroups
A vector of percentiles to be used for the categorization procedure
raw.dataFrame
A data frame similar to data, but with unadjusted data, used to get the means and variances of the unadjusted data
description
The name of the column in variableList that stores the variable description
uniType
Type of univariate analysis: Binary classification ("Binary") or regression ("Regression")
FullAnalysis
If FALSE it will only order the features according to its z-statistics of the linear model

Value

orderframe
A sorted list of model variables stored in a data frame
variableList
The argument variableList
formula
The argument formula
Outcome
The argument Outcome
data
The argument data
categorizationType
The argument categorizationType
type
The argument type
rankingTest
The argument rankingTest
cateGroups
The argument cateGroups
raw.dataFrame
The argument raw.dataFrame
description
The argument description
uniType
The argument uniType

Details

This function will create valid dummy categorical variables if, and only if, data has been z-standardized. The p-values provided in cateGroups will be converted to its corresponding z-score, which will then be used to create the categories. If non z-standardized data were to be used, the categorization analysis would return wrong results.

References

Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in medicine 27(2), 157-172.

See Also

update.uniRankVar, univariateRankVariables

Examples

Run this code
	## Not run: 
# 	# Start the graphics device driver to save all plots in a pdf format
# 	pdf(file = "Example.pdf")
# 	# Get the stage C prostate cancer data from the rpart package
# 	library(rpart)
# 	data(stagec)
# 	# Split the stages into several columns
# 	dataCancer <- cbind(stagec[,c(1:3,5:6)],
# 	                    gleason4 = 1*(stagec[,7] == 4),
# 	                    gleason5 = 1*(stagec[,7] == 5),
# 	                    gleason6 = 1*(stagec[,7] == 6),
# 	                    gleason7 = 1*(stagec[,7] == 7),
# 	                    gleason8 = 1*(stagec[,7] == 8),
# 	                    gleason910 = 1*(stagec[,7] >= 9),
# 	                    eet = 1*(stagec[,4] == 2),
# 	                    diploid = 1*(stagec[,8] == "diploid"),
# 	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
# 	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
# 	# Remove the incomplete cases
# 	dataCancer <- dataCancer[complete.cases(dataCancer),]
# 	# Load a pre-stablished data frame with the names and descriptions of all variables
# 	data(cancerVarNames)
# 	# Rank the variables:
# 	# - Analyzing the raw data
# 	# - According to the zIDI
# 	rankedDataCancer <- uniRankVar(variableList = cancerVarNames,
# 	                               formula = "Surv(pgtime, pgstat) ~ 1",
# 	                               Outcome = "pgstat",
# 	                               data = dataCancer, 
# 	                               categorizationType = "Raw", 
# 	                               type = "COX", 
# 	                               rankingTest = "zIDI",
# 	                               description = "Description")
# 	# Shut down the graphics device driver
# 	dev.off()## End(Not run)

Run the code above in your browser using DataLab