This function performs a Monte Carlo CV for each of the Random Forest model grown considering
all the k-combinations of the n input variables of the original dataset, with k ranging from 2 to n.
It allows to get the most performing Random Forest model in terms of the AUC of the ROC curve and
to obtain the most relevant input variables (metabolites or bins) associated with it.
a n x p dataframe used to build the models. The first two columns
must represent respectively the sample names and the class labels related to each sample
parameters
a list including the following parameters:
ntree the number of trees of each Random Forest model
nsplits the number of random splittings of the original dataset into training and test data sets
test_prop the percentage (expressed as a real number) of the observations of the original dataset
kmax the maximum number of inputs to combine.
Value
a list containing the most performing Random Forest model
#' @examples
## data(cachexiaData)
## params <- list(ntrees = 100, nsplits = 10, test_prop = 1/3)
## res <- combinatorialRFMCCV(dataset = cachexiaData[,1:10], parameters = params)
## This task may take a long time depending on the
## dimension of the dataset and on the parameters provided
Details
The function computes all the k-combinations of the n input variables, with k ranging from 2 to n.
Each combination corresponds to a dataset on which the function will grow a Random Forest model,
performing a Monte Carlo CV. Then it will provide the best performing model in terms of the AUC of the ROC curve
and the most relevant variables associated with it.