abcrf
constructs a random forest from a reference table towards performing
an ABC model choice. Basically, the reference table (i.e. the dataset that will
be treated with the present package) includes a column with the index
of the models to be compared and additional columns corresponding
to the values of the simulated summary statistics.
# S3 method for formula
abcrf(formula, data, group=list(), lda=TRUE, ntree=500, sampsize=min(1e5, nrow(data)),
paral=FALSE, ncores= if(paral) max(detectCores()-1,1) else 1, ...)
An object of class abcrf
, which is a list with the
following components:
the original call to abcrf
,
a boolean indicating if LDA scores have been added to the list of summary statistics,
the formula used to construct the classification random forest,
a list contining the groups of model(s) used. This list is empty if no grouping has been performed,
an object of class randomForest
containing the trained forest with the reference table,
an object of class lda
containing the Linear Discriminant Analysis based on the reference table,
prior error rates of model selection on the reference table, estimated with the "out-of-bag" error of the forest.
a formula: left of ~, variable representing the model index; right of ~, summary statistics of the reference table.
a data frame containing the reference table.
a list containing groups (at least 2) of model(s) on which the model choice will be performed. This is not necessarily a partition, one or more models can be excluded from the elements of the list and by default no grouping is done.
should LDA scores be added to the list of summary statistics?
number of trees to grow in the forest, by default 500 trees.
size of the sample from the reference table to grow a tree of the classification forest, by default the minimum between the number of elements of the reference table and 100,000.
a boolean that indicates if the calculations of the classification random forest (forest used to assign a model to the observed dataset) should be parallelized.
the number of CPU cores to use. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and detectCores
does not detect the number of CPU cores with success then 1 core is used.
additional arguments to be passed on to ranger
used to construct the classification random forest that preditcs the selected model.
Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert, C. P. (2016) Reliable ABC model choice via random forests Bioinformatics tools:::Rd_expr_doi("10.1093/bioinformatics/btv684")
Estoup A., Raynal L., Verdu P. and Marin J.-M. (2018) Model choice using Approximate Bayesian Computation and Random Forests: analyses based on model grouping to make inferences about the genetic history of Pygmy human populations Jounal de la Société Française de Statistique http://journal-sfds.fr/article/view/709
plot.abcrf
,
predict.abcrf
,
err.abcrf
,
ranger
data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf1 <- abcrf(modindex~., data = data1, ntree=100)
model.rf1
model.rf2 <- abcrf(modindex~., data = data1, group = list(c("1","2"),"3"), ntree=100)
model.rf2
Run the code above in your browser using DataLab