Learn R Programming

RFmarkerDetector (version 1.0.1)

getBestRFModel: Extracting the best performing Random Forest model

Description

This function allows to find the best performing Random Forest model starting from a k-combination of its input variables

Usage

getBestRFModel(combinations, data, params)

Arguments

combinations
a k x n matrix in which n is the number of combinations of the input variables and k is the size of each combination
data
a n x p data frame of n observations and p-2 predictors. The first two columns must represent the sample names and the classes associates to each sample
params
a list of params useful to perform a Monte Carlo Cross validation. It should contain the following data:
  • ntrees the number of trees of each random forest model
  • nsplits the number of random splittings of the original dataset into training and test data sets
  • test_prop the percentage (expressed as a real number) of the observations of the original dataset to be included in each test set
  • ref_level the assumed reference class label

Value

a list of the following elements:
  • best_model_set the set of best performing Random Forest models in terms of AUC
  • max_auc the maximum value of AUC corresponding to those models
  • biomarker_set the set of metabolites (or bins) corresponding to the best performing Random Forest

Details

The k-combinations of the input variables is represented as a k x n matrix in which k is the size of each combination and n is the number of combinations of the input variables of the original dataset. Each column of the combinations matrix contains the indexes of the input variables from the original dataset The getBestRFModel extracts a datAset from the original one considering the indexes in these columns. Then it will build a Random Forest model performing a Monte Carlo CV for each dataset. The models cross-validated will be compared considering the AUC of their averaged ROC curve. The function will return the best models, the maximum value of AUC and the most relevant input variables associated

Examples

Run this code
## data(cachexiaData)
## dataset <- cachexiaData[, 1:15]
## indexes <- 3:15
## combinations <- combn(x = indexes, m = 5) # a 5 x n_of_combinations matrix
## test_params = list(ntrees= 500, nsplits = 100, test_prop = 1/3)
## res <- getBestRFModel(combinations, dataset, test_params)

Run the code above in your browser using DataLab