This function performs a bootstrap sampling to rank the most frequent variables that statistically aid the models by minimizing the residuals. After the frequency rank, the function uses a forward selection procedure to create a final model, whose terms all have a significant contribution to the net residual improvement (NeRI).
ForwardSelection.Model.Res(size = 100,
fraction = 1,
pvalue = 0.05,
loops = 100,
covariates = "1",
Outcome,
variableList,
data,
maxTrainModelSize = 20,
type = c("LM", "LOGIT", "COX"),
testType=c("Binomial", "Wilcox", "tStudent", "Ftest"),
timeOutcome = "Time",
cores = 6,
randsize = 0,
featureSize=0)
The number of candidate variables to be tested (the first size
variables from variableList
)
The fraction of data (sampled with replacement) to be used as train
The maximum p-value, associated to the NeRI, allowed for a term in the model (controls the false selection rate)
The number of bootstrap loops
A string of the type "1 + var1 + var2" that defines which variables will always be included in the models (as covariates)
The name of the column in data
that stores the variable to be predicted by the model
A data frame with two columns. The first one must have the names of the candidate variables and the other one the description of such variables
A data frame where all variables are stored in different columns
Maximum number of terms that can be included in the model
Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX")
Type of non-parametric test to be evaluated by the improvedResiduals
function: Binomial test ("Binomial"), Wilcoxon rank-sum test ("Wilcox"), Student's t-test ("tStudent"), or F-test ("Ftest")
The name of the column in data
that stores the time to event (needed only for a Cox proportional hazards regression model fitting)
Cores to be used for parallel processing
the model size of a random outcome. If randsize is less than zero. It will estimate the size
The original number of features to be explored in the data frame.
An object of class lm
, glm
, or coxph
containing the final model
A vector with the names of the features that were included in the final model
An object of class formula
with the formula used to fit the final model
An array with the ranked frequencies of the features
A list containing objects of class formula
with the formulas used to fit the models found at each cycle
A list of variables used in the forward selection