Learn R Programming

MicrobTiSDA (version 0.1.0)

Rf.biomarkers: Select Biomarkers Based on Random Forest Cross-Validation Results

Description

This function extracts the top biomarkers from a random forest classification result based on cross-validation and user specified number of microbial features. It updates the cross-validation plot by adding a vertical dashed line at the specified number of features, and then selects the top features (biomarkers) based on their importance ranking.

Usage

Rf.biomarkers(rf = rf_results, feature_select_num)

Value

An object of class RfBiomarker with two elements:

OTU_importance

A data frame of the selected biomarkers (transposed feature table).

cross_validation_fig

A ggplot object of the cross-validation plot with a vertical dashed line indicating the feature selection cutoff.

Arguments

rf

A list containing the results of the random forest classification. Default to Data.rf.classifier.

feature_select_num

A numeric value specifying the number of top features (biomarkers) to select. Typically, the numer of specified biomarkers needs to be determined by the user based on the cross-validation result plot output by Data.rf.classifier.

Author

Shijia Li

Details

The function takes an object (usually the output from Data.rf.classifier, which includes a cross-validation plot, an OTU importance table, and the original input data) and a user-specified number of features to select. It then updates the cross-validation plot by adding a vertical dashed line at the position corresponding to the number of selected features. Next, it extracts the top features from the OTU importance table (ordered by Mean Decrease Accuracy) and creates q table of these features from the original microbial feature table. The function returns a list that includes both the transposed biomarker table and the modified cross-validation plot.

Examples

Run this code
# \donttest{
# Example OTU count data (20 OTUs x 10 samples)
set.seed(123)
otu_data <- matrix(sample(0:100, 200, replace = TRUE), nrow = 20)
colnames(otu_data) <- paste0("Sample", 1:10)
rownames(otu_data) <- paste0("OTU", 1:20)

# Example metadata with group labels
metadata <- data.frame(Group = rep(c("Control", "Treatment"), each = 5))

# Run the classifier
rf_result <- Data.rf.classifier(raw_data = otu_data,
                             metadata = metadata,
                             train_p = 0.7,
                             Group = "Group",
                             OTU_counts_filter_value = 50)
# If you wish to select the top 5 features:
result <- Rf.biomarkers(rf = rf_result, feature_select_num = 5)
# View the biomarker table
print(result$OTU_importance)
# View the updated cross-validation plot
print(result$cross_validation_fig)
# }

Run the code above in your browser using DataLab