forwardSelectionModels.BigBang: Gets the ``best'' models using top-ranked genes and a forward-selection strategy

Description

Gets the ``best'' models using top-ranked genes and a forward-selection strategy.

Usage

# S3 method for BigBang
forwardSelectionModels(.O,
	filter="none",
	subset=TRUE,
	geneIndexSet=NULL,
	starti=NULL,
	endi=NULL,
	fitnessFunc=if (!is.function(.O$data$modelSelectionFunc)) .O$galgo$fitnessFunc 
				else .O$data$modelSelectionFunc,
	minFitness=NULL,
	plot=TRUE,
	plot.preview=TRUE,
	decision=c("overall", "average"),
	plot.type=c("lines", "boxplot"),
	approach=c("fitness", "error"),
	pch=20,
	result=c("all", "models", "fitness"),
	threshold=0.99,
	main=.O$main,
	mord=min(ncol(.O$data$data), 50),
	mcol=8,
	rcol=(if (mcol < 2) c(rep(1, mord), 0) 
		else c(cut(1:mord, breaks = mcol, labels = FALSE), 0)),
	classFunc=.O$data$classFunc,
	compute.classes=is.function(classFunc),
	cex=1,
    cex.axis=0.66,
	set=c(0,1),
	...)

Arguments

filter

The BigBang object can save information about solutions that did not reach the goalFitness. filter=="solutions" ensures that only chromosomes that reach the goalFitness are considered. fitlter=="none" take all chromosomes. filter=="nosolutions" consider only no-solutions (for comparative purposes).

subset

Second level of filter. subset can be a vector specifying which filtered chromosomes are used. It can be a logical vector or a numeric vector (indexes in order given by $bestChromosomes in BigBang object variable). If it is a numeric vector length one, a positive value means take those top chromosomes sorted by fitness, a negative value take those at bottom.

geneIndexSet

The genes index to use (ignoring filter and subset). If this is not specified the indexes are computed using filter and subset.

starti

Vector of initial index positions of models to test. If specified, should be the same length than endi. If omitted, the default repeat 1 until the same length than endi.

endi

Vector of final index positions of models to test.

fitnessFunc

The function that evaluate the performance (fitness) of every model (chromosome). The real measure is the ``mean'' computed from the resulted values for every chromosome. Thus fitnessFunc can returns a single numeric value (as in $galgo$fitnessFunc) or a numeric vector (as in $data$modelSelectionfunc). The default is $data$modelSelectionFunc unless it is NULL and $galgo$fitnessFunc is used.

minFitness

The minimum fitness requested. All models with mean fitness above this value will be reported. NULL specify the usage of the maximum fitness from the results. "se*sp" use the maximum value computed by multipling the sensitivity and specificity when compute.classes==TRUE.

decision

Specify how to select the model. "overall" select the model based on the accuracy of all samples whereas "average" selects the model based in the average accuracy per class. If the number of samples per class is exactly the same, both results are equal. The default is "overall". If classFunc is not specified or compute.classes==FALSE, decision is forced to "overall".

plot

Logical value indicating whether the result should be displayed.

plot.type

"lines" draws a line joining points. "boxplot" add a boxplot when the fitnessFunc returns more than one value.

approach

"fitness" draws fitness. "error" draws error (1-fitness).

result

Specify the desired output. "models" will report only the models above the minFitness. "fitness" will report only the fitness of the models above the minFitness. "all" (default) will report both models and fitness in a list including all computed fitnesses and class prediction accuracies (if compute.classes==TRUE).

threshold

Specify the percentage of minFitness for selecting models.

mord

Specify the number of top-ranked genes (*plot() and others *** MISSING ***). Defaults to 50. It should not be less than the maximum endi.

mcol

Specify the number of section for top-rank colouring.(*plot() and others *** MISSING ***)

rcol

Specify the colours of sections.(*plot() and others *** MISSING ***)

classFunc

Function that predict the class. The default is $data$classFunc.

compute.classes

Specify that class accuracies are desired (and plotted). In non-classification problems, it should be FALSE.

pch,main,cex,cex.axis

Plot parameters.

...

Other parameters used for plot, fitnessFunc and classFunc.

Value

Depends on result.

Details

It is expected that the fitnessFunc computes the overall fitness (the proportion of correctly classify samples regardless of their classes). However, this value could be slightly different to the curve marked as "(avg)" which is the average fitness per class. This difference is due to the different number of samples per class and the number of times specifc samples where used to be part of the test set in both, the fitness function and the class function.

References

Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675

Examples

Run this code

# NOT RUN {
   #bb is a BigBang object
   fsm <- forwardSelectionModels(bb)
   fsm
   names(fsm)
   heatmapModels(fsm, subset=1)
   fsm <- forwardSelectionModels(bb, minFitness=0.9,
   fitnessFunc=bb$galgo$fitnessFunc)
   heatmapModels(fsm, subset=1)
   pcaModels(fsm, subset=1)
   fitnessSplits(bb, chromosomes=list(fsm$models[[1]]))
   
# }
# NOT RUN {
 
# }

Run the code above in your browser using DataLab