Gets the ``best'' models using top-ranked genes and a forward-selection strategy.
# S3 method for BigBang
forwardSelectionModels(.O,
filter="none",
subset=TRUE,
geneIndexSet=NULL,
starti=NULL,
endi=NULL,
fitnessFunc=if (!is.function(.O$data$modelSelectionFunc)) .O$galgo$fitnessFunc
else .O$data$modelSelectionFunc,
minFitness=NULL,
plot=TRUE,
plot.preview=TRUE,
decision=c("overall", "average"),
plot.type=c("lines", "boxplot"),
approach=c("fitness", "error"),
pch=20,
result=c("all", "models", "fitness"),
threshold=0.99,
main=.O$main,
mord=min(ncol(.O$data$data), 50),
mcol=8,
rcol=(if (mcol < 2) c(rep(1, mord), 0)
else c(cut(1:mord, breaks = mcol, labels = FALSE), 0)),
classFunc=.O$data$classFunc,
compute.classes=is.function(classFunc),
cex=1,
cex.axis=0.66,
set=c(0,1),
...)
The BigBang
object can save information about solutions that did not reach the goalFitness
. filter=="solutions"
ensures that only chromosomes that reach the goalFitness
are considered. fitlter=="none"
take all chromosomes. filter=="nosolutions"
consider only no-solutions (for comparative purposes).
Second level of filter. subset
can be a vector specifying which filtered chromosomes are used. It can be a logical vector or a numeric vector (indexes in order given by $bestChromosomes
in BigBang
object variable). If it is a numeric vector length one, a positive value means take those top chromosomes sorted by fitness, a negative value take those at bottom.
The genes index to use (ignoring filter
and subset
). If this is not specified the indexes are computed using filter
and subset
.
Vector of initial index positions of models to test. If specified, should be the same length than endi
. If omitted, the default repeat 1
until the same length than endi
.
Vector of final index positions of models to test.
The function that evaluate the performance (fitness) of every model (chromosome). The real measure is the ``mean'' computed from the resulted values for every chromosome. Thus fitnessFunc
can returns a single numeric value (as in $galgo$fitnessFunc
) or a numeric vector (as in $data$modelSelectionfunc
). The default is $data$modelSelectionFunc
unless it is NULL
and $galgo$fitnessFunc
is used.
The minimum fitness requested. All models with mean fitness above this value will be reported. NULL
specify the usage of the maximum fitness from the results. "se*sp"
use the maximum value computed by multipling the sensitivity and specificity when compute.classes==TRUE
.
Specify how to select the model. "overall"
select the model based on the accuracy of all samples whereas "average"
selects the model based in the average accuracy per class. If the number of samples per class is exactly the same, both results are equal. The default is "overall"
. If classFunc
is not specified or compute.classes==FALSE
, decision
is forced to "overall"
.
Logical value indicating whether the result should be displayed.
"lines"
draws a line joining points. "boxplot"
add a boxplot when the fitnessFunc
returns more than one value.
"fitness"
draws fitness. "error"
draws error (1-fitness).
Specify the desired output. "models"
will report only the models above the minFitness
. "fitness"
will report only the fitness of the models above the minFitness
. "all"
(default) will report both models and fitness in a list including all computed fitnesses and class prediction accuracies (if compute.classes==TRUE
).
Specify the percentage of minFitness
for selecting models.
Specify the number of top-ranked genes (*plot()
and others *** MISSING ***). Defaults to 50. It should not be less than the maximum endi
.
Specify the number of section for top-rank colouring.(*plot()
and others *** MISSING ***)
Specify the colours of sections.(*plot()
and others *** MISSING ***)
Function that predict the class. The default is $data$classFunc
.
Specify that class accuracies are desired (and plotted). In non-classification problems, it should be FALSE
.
Plot parameters.
Other parameters used for plot
, fitnessFunc
and classFunc
.
Depends on result
.
It is expected that the fitnessFunc
computes the overall fitness (the proportion of correctly classify samples regardless of their classes). However, this value could be slightly different to the curve marked as "(avg)"
which is the average fitness per class. This difference is due to the different number of samples per class and the number of times specifc samples where used to be part of the test set in both, the fitness function and the class function.
Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675
For more information see BigBang
.
*plot()
,
*heatmapModels()
,
*pcaModels()
.
# NOT RUN {
#bb is a BigBang object
fsm <- forwardSelectionModels(bb)
fsm
names(fsm)
heatmapModels(fsm, subset=1)
fsm <- forwardSelectionModels(bb, minFitness=0.9,
fitnessFunc=bb$galgo$fitnessFunc)
heatmapModels(fsm, subset=1)
pcaModels(fsm, subset=1)
fitnessSplits(bb, chromosomes=list(fsm$models[[1]]))
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab