Learn R Programming

funbarRF (version 1.0.2)

predict_train_funbarRF: Prediction of species labels for the out-of-bag (OOB) reference barcode sequence using Random Forest.

Description

Generally, training or reference dataset is used to train the model and not for prediction purpose. However, since Random Forest method is used here, prediction for the OOB instances is made. The OOB instances are the observations that are not participated in constructing tree-based classifiers.

Usage

predict_train_funbarRF (object, m_try = 10, n_tree = 500)

Arguments

object

An object created by the function seq_funbarRF or seq_funbarRF_manual .

m_try

This parameter is required for randomForest. It represents the number of variables to be randomly sampled at each split. Default value is 10.

n_tree

This is also a parameter for randomForest. It denotes the number of tree-based classifiers to be built. This should not be set to too small a number, to ensure that every instance gets predicted at least a few times. Default is 500.

Value

result_train

A dataframe consisting of species labels, number of species labels observed and correctly predicted.

Details

The user has to supply the reference sequence dataset to assess the accuracy of the developed prediction approach. Here, the prediction for the species label is made for the OOB instances and are then aggregated over all the classifiers for final prediction based on majority voting strategy.

References

  1. Liaw A., and Wiener M. (2002). Classification and Regression by randomForest. R News, 2(3), 18-22.

  2. Meher P.K., Sahu T.K., and Rao A.R. (2016). Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier. Gene, 592(2), 316-324.

See Also

randomForest, predict_test_funbarRF

Examples

Run this code
# NOT RUN {
#######################
data (fun_dat) 
kk <- read_seq_txt (fun_dat$seq)[1:5]
zz <- as.factor(as.character (fun_dat$seq_name)[1:5])
train <- seq_funbarRF (reference_seq=kk, seq_id=zz)
res <- predict_train_funbarRF (object=train, m_try=10, n_tree=20) 
# kindly use large number of n_tree
print(res)

######################
# }
# NOT RUN {
data (data_barcode)
tr_ss <- seq_funbarRF_manual (manual_seq=data_barcode$Fish$train[1:100])
prd1 <- predict_train_funbarRF (object=tr_ss, m_try=10, n_tree=500)
# kindly use large number of n_tree
print(prd1)

# }

Run the code above in your browser using DataLab