combinatorialRFMCCV: Combinatorial Monte Carlo CV

Description

This function performs a Monte Carlo CV for each of the Random Forest model grown considering all the k-combinations of the n input variables of the original dataset, with k ranging from 2 to n. It allows to get the most performing Random Forest model in terms of the AUC of the ROC curve and to obtain the most relevant input variables (metabolites or bins) associated with it.

Usage

combinatorialRFMCCV(dataset, parameters = list(ntrees = 500, nsplits = 100, test_prop = 1/3, kmax = 5))

Arguments

dataset

a n x p dataframe used to build the models. The first two columns must represent respectively the sample names and the class labels related to each sample

parameters

a list including the following parameters:

ntree the number of trees of each Random Forest model
nsplits the number of random splittings of the original dataset into training and test data sets
test_prop the percentage (expressed as a real number) of the observations of the original dataset
kmax the maximum number of inputs to combine.

Value

a list containing the most performing Random Forest model #' @examples ## data(cachexiaData) ## params <- list(ntrees = 100, nsplits = 10, test_prop = 1/3) ## res <- combinatorialRFMCCV(dataset = cachexiaData[,1:10], parameters = params) ## This task may take a long time depending on the ## dimension of the dataset and on the parameters provided

Details

The function computes all the k-combinations of the n input variables, with k ranging from 2 to n. Each combination corresponds to a dataset on which the function will grow a Random Forest model, performing a Monte Carlo CV. Then it will provide the best performing model in terms of the AUC of the ROC curve and the most relevant variables associated with it.