This function reports the mean and standard deviation for each feature in a model, and ranks them according to a user-specified score. Additionally, it does a Kolmogorov-Smirnov (KS) test on the raw and z-standardized data. It also reports the raw and z-standardized t-test score, the p-value of the Wilcoxon rank-sum test, the integrated discrimination improvement (IDI), the net reclassification improvement (NRI), the net residual improvement (NeRI), and the area under the ROC curve (AUC). Furthermore, it reports the z-value of the variable significance on the fitted model.
univariateRankVariables(variableList,
formula,
Outcome,
data,
categorizationType = c("Raw",
"Categorical",
"ZCategorical",
"RawZCategorical",
"RawTail",
"RawZTail",
"Tail",
"RawRaw"),
type = c("LOGIT", "LM", "COX"),
rankingTest = c("zIDI",
"zNRI",
"IDI",
"NRI",
"NeRI",
"Ztest",
"AUC",
"CStat",
"Kendall"),
cateGroups = c(0.1, 0.9),
raw.dataFrame = NULL,
description = ".",
uniType = c("Binary","Regression"),
FullAnalysis=TRUE,
acovariates = NULL,
timeOutcome = NULL
)
A data frame with the candidate variables to be ranked
An object of class formula
with the formula to be fitted
The name of the column in data
that stores the variable to be predicted by the model
A data frame where all variables are stored in different columns
How variables will be analyzed: As given in data
("Raw"); broken into the p-value categories given by cateGroups
("Categorical"); broken into the p-value categories given by cateGroups
, and weighted by the z-score ("ZCategorical"); broken into the p-value categories given by cateGroups
, weighted by the z-score, plus the raw values ("RawZCategorical"); raw values, plus the tails ("RawTail"); or raw values, weighted by the z-score, plus the tails ("RawZTail")
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")
Variables will be ranked based on: The z-score of the IDI ("zIDI"), the z-score of the NRI ("zNRI"), the IDI ("IDI"), the NRI ("NRI"), the NeRI ("NeRI"), the z-score of the model fit ("Ztest"), the AUC ("AUC"), the Somers' rank correlation ("Cstat"), or the Kendall rank correlation ("Kendall")
A vector of percentiles to be used for the categorization procedure
A data frame similar to data
, but with unadjusted data, used to get the means and variances of the unadjusted data
The name of the column in variableList
that stores the variable description
Type of univariate analysis: Binary classification ("Binary") or regression ("Regression")
If FALSE it will only order the features according to its z-statistics of the linear model
the list of covariates
the name of the Time to event feature
A sorted data frame. In the case of a binary classification analysis, the data frame will have the following columns:
Name of the raw variable or of the dummy variable if the data has been categorized
Name of the raw variable from which the dummy variable was created
Description of the parent variable, as defined in description
Mean value of the variable
Standard deviation of the variable
D statistic of the KS test when comparing a normal distribution and the distribution of the variable
Associated p-value to the cohortKSD
Mean value of cases (subjects with Outcome
equal to 1)
Standard deviation of cases
D statistic of the KS test when comparing a normal distribution and the distribution of the variable only for cases
Associated p-value to the caseKSD
D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable only for cases
Associated p-value to the caseZKSD
Mean value of controls (subjects with Outcome
equal to 0)
Standard deviation of controls
D statistic of the KS test when comparing a normal distribution and the distribution of the variable only for controls
Associated p-value to the controlsKSD
D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable only for controls
Associated p-value to the controlsZKSD
Normal inverse p-value (z-value) of the t-test performed on raw.dataFrame
z-value of the t-test performed on data
z-value of the Wilcoxon rank-sum test performed on data
z-value returned by the lm
, glm
, or coxph
functions for the z
-standardized variable
z-value returned by the improveProb
function (Hmisc
package) when evaluating the NRI
z-value returned by the improveProb
function (Hmisc
package) when evaluating the IDI
z-value returned by the improvedResiduals
function when evaluating the NeRI
Area under the ROC curve returned by the roc
function (pROC
package)
c index of Somers' rank correlation returned by the rcorr.cens
function (Hmisc
package)
NRI returned by the improveProb
function (Hmisc
package)
IDI returned by the improveProb
function (Hmisc
package)
NeRI returned by the improvedResiduals
function
Kendall \(\tau\) rank correlation coefficient between the variable and the binary outcome
Associated p-value to the kendall.r
p-value of the improvement in residuals, as evaluated by the paired t-test
p-value of the improvement in residuals, as evaluated by the paired Wilcoxon rank-sum test
p-value of the improvement in residual variance, as evaluated by the F-test
Number of cases in the low tail
Number of cases in the top tail
Number of controls in the low tail
Number of controls in the top tail
Name of the raw variable or of the dummy variable if the data has been categorized
Name of the raw variable from which the dummy variable was created
Description of the parent variable, as defined in description
Mean value of the variable
Standard deviation of the variable
D statistic of the KS test when comparing a normal distribution and the distribution of the variable
Associated p-value to the cohortKSP
D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable
Associated p-value to the cohortZKSD
z-value returned by the glm or Cox procedure for the z-standardized variable
z-value returned by the improveProb
function (Hmisc
package) when evaluating the NRI
NeRI returned by the improvedResiduals
function
c index of Somers' rank correlation returned by the rcorr.cens
function (Hmisc
package)
Spearman \(\rho\) rank correlation coefficient between the variable and the outcome
Pearson r product-moment correlation coefficient between the variable and the outcome
Kendall \(\tau\) rank correlation coefficient between the variable and the outcome
Associated p-value to the kendall.r
p-value of the improvement in residuals, as evaluated by the paired t-test
p-value of the improvement in residuals, as evaluated by the paired Wilcoxon rank-sum test
p-value of the improvement in residual variance, as evaluated by the F-test
This function will create valid dummy categorical variables if, and only if, data
has been z-standardized.
The p-values provided in cateGroups
will be converted to its corresponding z-score, which will then be used to create the categories.
If non z-standardized data were to be used, the categorization analysis would return wrong results.
Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in medicine 27(2), 157-172.