Plots about the collected information in a BigBang object. See arguments for details.
# S3 method for BigBang
plot(x,
y=NULL,
...,
type=c("genefrequency","generank","generankstability",
"geneoverlap","geneoverlaphor",
"fitness","fitnessboxes","generations",
"rankindex","genefrequencydist","topgenenumber","rankindexcol",
"confusion","confusionbar","confusionbox",
"splits","splitsmap","splitsfitness","fitnesssplits","fitnesssplitsbox",
"genecoverage","confusionpamr",
"genesintop","genenetwork","genevalues","genevalueslines",
"genevaluesbox","geneprofiles",
"sampleprofiles","rankfitness")[c(1,3,8)],
filter=c("none","solutions","nosolutions"),
subset=TRUE,
mcol=8,
mord=min(ncol(o$data$data),50),
rcol=(if(mcol < 2) c(rep(1,mord),0)
else c(cut(1:mord,breaks=mcol,labels=FALSE),0)),
new.dev=FALSE,
sort.chr=4,
freq.col=rgb(.4,.4,.4),
freq.all.labels=FALSE,
rank.lwd=5,
rank.order=c("rank","reverse","random"),
gene.names=TRUE,
rankindex.log=NULL,
coverage.log="x",
classFunc=NULL,
classes=NULL,
confusion.all=TRUE,
contrast=0.15,
coverage=c(0.25,0.5,0.75,1),
samples=NULL,
samples.cex=0.75,
pch=20,
main=o$main,
nbf=1,
net.method=c("isoMDS","cmdscale","sammon"),
net.th=2,
node.size=6,
node.name=c("index","rownames"),
node.namecol=NULL,
xlim=NULL,
ylim=NULL,
xlab="",
ylab="",
cex=1,
exp.freq=TRUE
)
Optional additional data relative to the plot type. Some types may benefit from this parameter.
Specify the types of plots.
Plot the frequency of genes computed from the chromosomes in the specified filter (see filter
and subset
). Peaks reveal high-frequent genes, thus potentially ``important'' genes. ``Top-ranked'' genes are colored respect to its rank (see mord, mcol and rcol
). Labels are optional (see freq.all.labels
).
Similar to "genefrequency"
but drawing only ``top-ranked'' genes and sorte by rank.
Because of the stochasticity of the process, it is difficult to decide how many solutions are required to stabilize the gene ranks and thus avoiding random fluctuations. "generankstability"
is designed to show visually how the rank of the current ``top-ranked'' genes has been changed in the course. Many changes of colours reveals rank instability whereas few or no-changes show stability. Commonly, the top (10 to 20) genes are the quickest genes to stabilize. One can decide to "stop" the process or "start" the analysis when at least 10 or 20 genes has been "stable" for 100 or 200 solutions.
Overview of how the chromosomes are ``overlapped'' and ``represented'' by the top-ranked genes (see sort.chr
).
Horizontal version of "geneoverlap"
.
Shows the histogram of the number of top-genes included in models.
The evolution of the maximum fitness for each solution. It includes descriptive confidence intervals (average among all and average among the worst). The point where the highest interval intersects the goalFitness
is the ``average'' number of generations needed to reach that fitness value. It could be useful for deciding the number of generations and the goal fitness value.
Similar to "fitness"
but using boxplot. Useful for "statistical" intervals.
Distribution of the final generation from each galgo. A large peak at minGenerations
means ``premature'' convergence or ``easy'' codegoalFitness; perhaps increasing the goalFitness
worth. A trend to ``maxGenerations'' may be indicative of very high goalFitness
or low maxGenerations
. (may be normal when onlySolutions == FALSE
).
Shows the rank versus index. A vertical line indicate many genes in the same rank, probably due by random, not stable or insuficent solutions.
Shows the distribution of the gene frequency.
Shows the number of genes whose frecuency is higher that specific values. It try to answer questions like ``how many genes appears in X chromosomes?''. It is helpful to decide how many ``top-genes'' include in plots. Genes with low frequency may be asociated with random fluctations.
For classification problems, it shows the confusion matrix and the probability for all samples in each class. It needs a classFunc
specification (unless $data$classFunc
exists in the BigBang
object) or y=classPredictionMatrix
. An NA
``class'' has been add in the predicted class axis (vertical) for those classification methods that cannot produce a class prediction in all cases. The default is that the bar size is meant as ``probability'' of that sample to pertain in that class. The sensitivity and specificity for all classes are given in the horizontal axis (sensitivity=TP/TP+FN, specificity=TN/TN+FP, TP=True Positives, TN=True Negatives, FP=False Positives, FN=False Negatives).
Similar than ``confusion'' but showing distribution boxes for each class.
Similar than ``confusion'' in style similar to pamr package.
Gives an overview on how the splits were build. Perhaps useless.
Gives an clustering overview on how the splits were build (to detect biased splits). Perhaps useless.
It plots the boxplot of the evaluation of chromosomes in different splits. Perhaps useless.
Plots the distribution of fitness evaluated in different splits. To check whether the chromosomes are ``split-dependent''.
It plots the boxplot of the evaluation of chromosomes in different splits. Perhaps useless.
Plot the number of possible top-ranked genes in horizontal versus the percentage of total genes present in chromosomes. It tries to answer questions like "how many N
top-genes are required to ensure that these N
top-genes cover at least 50% of all genes in chromosomes?". Solution: Plot (type="genecoverage"
) look for 0.5 (50%) in vertical axis (or use coverage=0.5
) then project the point in the plot to horizontal axis.
Plot the ``dependency'' of genes to each other in a network format. The distance is a measure of how many chromosomes those two genes are together normalized to the total number of interactions. The thickness of the connection is relative to the relative strength of the shown connections.
The BigBang
object can save information about solutions that did not reach the goalFitness
. filter=="solutions"
ensures that only chromosomes that reach the goalFitness
are considered. fitlter=="none"
take all chromosomes. filter=="nosolutions"
consider only no-solutions (for comparative purposes).
Second level of filter. subset
can be a vector specifying which filtered chromosomes are used. It can be a logical vector or a numeric vector (indexes in order given by $bestChromosomes
in BigBang
object variable). If it is a numeric vector length one, a positive value means take those top chromosomes sorted by fitness, a negative value take those at bottom.
The number of ``top-ranked-genes'' to highlight.
The number of colours (or sections) to highlight ranked genes.
The specific colours for every ``top-ranked-gene''. If specified, its length should be mord+1
.
For type
is a vector length greather than 1, TRUE
create two new plot windows.
For type=="geneoverlap"
, sort.chr
can be used to sort the chromosomes. sort.chr==0
sort the genes according to its fitness which could reveal trends in gene-fitness. sort.chr < 0
no sort at all, the chromosomes are shown as they were obtained. sort.chr > 0
controls the chromosome sorting by the prescence of ``top-ranked'' genes and the recursive level (as higher as slower).
For type=="genefrequency"
, freq.col
is the colour for non ``top-ranked'' genes.
For type=="genefrequency"
, freq.all.labels
plot the names for all ``top-ranked'' genes.
For type=="generank"
(and others), rnk.lwd
is the line width (see lwd
).
For type=="generank"
(and others), rank.order
controls the order of ranked genes.
TRUE
for plotting gene names (from BigBang
object). FALSE
use gene indexes instead. Character vector for user-specification.
Change the log plot parameter for type=="rankindex"
.
Change the log plot parameter for type=="genecoverage"
.
Specify the classification function when a type=="confusion"
and a confusion matrix is needed.
Specify the classes (overwriting the BigBang
default) when a type=="confusion"
and a confusion matrix is needed.
TRUE
draw mean probability values for all combinations in the confusion plot.
Contrast factor for same colour/section in ranks. 0=All genes in same section are exactly the same colour. 1="Maximum" contrast factor.
For type="genecoverage"
, coverage
specify the points for comparison. For instance 0.5 meant the number of top-ranked genes needed that cover 50% of total genes present in all chromosomes.
Specify the sample names (overwriting the BigBang
default)
Specify the character size for ploting the sample names.
If type=``fitnessboxes''
, nbf
specifies the divisor of the number of boxes in the plot. Defaults to 1.
If type=genenetwork''
, it specifies the connections to plot. net.th < 1
specifies to plot connections whose distance <= net.th. net.th >= 1
specifies to plot the highest net.th connections for each node. Default is 2.
If type=genenetwork''
, it specifies the method to compute the coordinates. Methods are c("isoMDS","cmdscale","sammon")
.
If type=genenetwork''
, it specifies the size of the node.
If type=genenetwork''
, it specifies the naming scheme, which can be c("index","rownames")
.
If type=genenetwork''
, it specifies the color of the node names.
BigBang
defaults for common plot parameters. Their usage overwrite the default value.
Other plot parameters (not always passed to subsequent routines).
Returns nothing.
Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675
For more information see BigBang
.
# NOT RUN {
cr <- Chromosome(genes=newCollection(Gene(shape1=1, shape2=100),5))
ni <- Niche(chromosomes=newRandomCollection(cr, 10))
wo <- World(niches=newRandomCollection(ni,2))
ga <- Galgo(populations=newRandomCollection(wo,1), goalFitness = 0.75,
callBackFunc=plot,
fitnessFunc=function(chr, parent) 5/sd(as.numeric(chr)))
#evolve(ga) ## not needed here
bb <- BigBang(galgo=ga, maxSolutions=10, maxBigBangs=10, saveGeneBreaks=1:100)
blast(bb)
plot(bb)
plot(bb, type=c("fitness","genefrequency"))
plot(bb, type="generations")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab