plot.BigBang: Plots about the collected information in a BigBang object

Description

Plots about the collected information in a BigBang object. See arguments for details.

Usage

# S3 method for BigBang
plot(x, 
	y=NULL, 
	...,
	type=c("genefrequency","generank","generankstability",
	"geneoverlap","geneoverlaphor",
	"fitness","fitnessboxes","generations",
	"rankindex","genefrequencydist","topgenenumber","rankindexcol",
	"confusion","confusionbar","confusionbox",
	"splits","splitsmap","splitsfitness","fitnesssplits","fitnesssplitsbox",
	"genecoverage","confusionpamr",
	"genesintop","genenetwork","genevalues","genevalueslines",
	"genevaluesbox","geneprofiles",
	"sampleprofiles","rankfitness")[c(1,3,8)],
	filter=c("none","solutions","nosolutions"),
	subset=TRUE,
	mcol=8, 
	mord=min(ncol(o$data$data),50), 
	rcol=(if(mcol < 2) c(rep(1,mord),0) 
	     else c(cut(1:mord,breaks=mcol,labels=FALSE),0)), 
	new.dev=FALSE, 
	sort.chr=4,
	freq.col=rgb(.4,.4,.4),
	freq.all.labels=FALSE,
	rank.lwd=5,
	rank.order=c("rank","reverse","random"),
	gene.names=TRUE,
	rankindex.log=NULL,
	coverage.log="x",
	classFunc=NULL,
	classes=NULL,
	confusion.all=TRUE,
	contrast=0.15,
	coverage=c(0.25,0.5,0.75,1),
	samples=NULL,
	samples.cex=0.75,
	pch=20,
	main=o$main, 
	nbf=1,
	net.method=c("isoMDS","cmdscale","sammon"),
	net.th=2,
	node.size=6,
	node.name=c("index","rownames"),
	node.namecol=NULL,
	xlim=NULL,
	ylim=NULL,
	xlab="",
	ylab="",
	cex=1,
    exp.freq=TRUE
	)

Arguments

Optional additional data relative to the plot type. Some types may benefit from this parameter.

type

Specify the types of plots.

type="genefrequency"

Plot the frequency of genes computed from the chromosomes in the specified filter (see filter and subset). Peaks reveal high-frequent genes, thus potentially ``important'' genes. ``Top-ranked'' genes are colored respect to its rank (see mord, mcol and rcol). Labels are optional (see freq.all.labels).

type="generank"

Similar to "genefrequency" but drawing only ``top-ranked'' genes and sorte by rank.

type="generankstability"

Because of the stochasticity of the process, it is difficult to decide how many solutions are required to stabilize the gene ranks and thus avoiding random fluctuations. "generankstability" is designed to show visually how the rank of the current ``top-ranked'' genes has been changed in the course. Many changes of colours reveals rank instability whereas few or no-changes show stability. Commonly, the top (10 to 20) genes are the quickest genes to stabilize. One can decide to "stop" the process or "start" the analysis when at least 10 or 20 genes has been "stable" for 100 or 200 solutions.

type="geneoverlap"

Overview of how the chromosomes are ``overlapped'' and ``represented'' by the top-ranked genes (see sort.chr).

type="geneoverlaphor"

Horizontal version of "geneoverlap".

type="genesintop"

Shows the histogram of the number of top-genes included in models.

type="fitness"

The evolution of the maximum fitness for each solution. It includes descriptive confidence intervals (average among all and average among the worst). The point where the highest interval intersects the goalFitness is the ``average'' number of generations needed to reach that fitness value. It could be useful for deciding the number of generations and the goal fitness value.

type="fitnessboxes"

Similar to "fitness" but using boxplot. Useful for "statistical" intervals.

type="generations"

Distribution of the final generation from each galgo. A large peak at minGenerations means ``premature'' convergence or ``easy'' codegoalFitness; perhaps increasing the goalFitness worth. A trend to ``maxGenerations'' may be indicative of very high goalFitness or low maxGenerations. (may be normal when onlySolutions == FALSE).

type="rankindex"

Shows the rank versus index. A vertical line indicate many genes in the same rank, probably due by random, not stable or insuficent solutions.

type="genefrequencydist"

Shows the distribution of the gene frequency.

type="topgenenumber"

Shows the number of genes whose frecuency is higher that specific values. It try to answer questions like ``how many genes appears in X chromosomes?''. It is helpful to decide how many ``top-genes'' include in plots. Genes with low frequency may be asociated with random fluctations.

type="confusion"

For classification problems, it shows the confusion matrix and the probability for all samples in each class. It needs a classFunc specification (unless $data$classFunc exists in the BigBang object) or y=classPredictionMatrix. An NA ``class'' has been add in the predicted class axis (vertical) for those classification methods that cannot produce a class prediction in all cases. The default is that the bar size is meant as ``probability'' of that sample to pertain in that class. The sensitivity and specificity for all classes are given in the horizontal axis (sensitivity=TP/TP+FN, specificity=TN/TN+FP, TP=True Positives, TN=True Negatives, FP=False Positives, FN=False Negatives).

type="confusionbox"

Similar than ``confusion'' but showing distribution boxes for each class.

type="confusionpamr"

Similar than ``confusion'' in style similar to pamr package.

type="splits"

Gives an overview on how the splits were build. Perhaps useless.

type="splitsmap"

Gives an clustering overview on how the splits were build (to detect biased splits). Perhaps useless.

type="splitsfitness"

It plots the boxplot of the evaluation of chromosomes in different splits. Perhaps useless.

type="fitnesssplits"

Plots the distribution of fitness evaluated in different splits. To check whether the chromosomes are ``split-dependent''.

type="fitnesssplitsbox"

It plots the boxplot of the evaluation of chromosomes in different splits. Perhaps useless.

type="genecoverage"

Plot the number of possible top-ranked genes in horizontal versus the percentage of total genes present in chromosomes. It tries to answer questions like "how many N top-genes are required to ensure that these N top-genes cover at least 50% of all genes in chromosomes?". Solution: Plot (type="genecoverage") look for 0.5 (50%) in vertical axis (or use coverage=0.5) then project the point in the plot to horizontal axis.

type="genenetwork"

Plot the ``dependency'' of genes to each other in a network format. The distance is a measure of how many chromosomes those two genes are together normalized to the total number of interactions. The thickness of the connection is relative to the relative strength of the shown connections.

filter

The BigBang object can save information about solutions that did not reach the goalFitness. filter=="solutions" ensures that only chromosomes that reach the goalFitness are considered. fitlter=="none" take all chromosomes. filter=="nosolutions" consider only no-solutions (for comparative purposes).

subset

Second level of filter. subset can be a vector specifying which filtered chromosomes are used. It can be a logical vector or a numeric vector (indexes in order given by $bestChromosomes in BigBang object variable). If it is a numeric vector length one, a positive value means take those top chromosomes sorted by fitness, a negative value take those at bottom.

mord

The number of ``top-ranked-genes'' to highlight.

mcol

The number of colours (or sections) to highlight ranked genes.

rcol

The specific colours for every ``top-ranked-gene''. If specified, its length should be mord+1.

new.dev

For type is a vector length greather than 1, TRUE create two new plot windows.

sort.chr

For type=="geneoverlap", sort.chr can be used to sort the chromosomes. sort.chr==0 sort the genes according to its fitness which could reveal trends in gene-fitness. sort.chr < 0 no sort at all, the chromosomes are shown as they were obtained. sort.chr > 0 controls the chromosome sorting by the prescence of ``top-ranked'' genes and the recursive level (as higher as slower).

freq.col

For type=="genefrequency", freq.col is the colour for non ``top-ranked'' genes.

freq.all.labels

For type=="genefrequency", freq.all.labels plot the names for all ``top-ranked'' genes.

rank.lwd

For type=="generank" (and others), rnk.lwd is the line width (see lwd).

rank.order

For type=="generank" (and others), rank.order controls the order of ranked genes.

genes.names

TRUE for plotting gene names (from BigBang object). FALSE use gene indexes instead. Character vector for user-specification.

rankindex.log

Change the log plot parameter for type=="rankindex".

coverage.log

Change the log plot parameter for type=="genecoverage".

classFunc

Specify the classification function when a type=="confusion" and a confusion matrix is needed.

classes

Specify the classes (overwriting the BigBang default) when a type=="confusion" and a confusion matrix is needed.

confusion.all

TRUE draw mean probability values for all combinations in the confusion plot.

contrast

Contrast factor for same colour/section in ranks. 0=All genes in same section are exactly the same colour. 1="Maximum" contrast factor.

coverage

For type="genecoverage", coverage specify the points for comparison. For instance 0.5 meant the number of top-ranked genes needed that cover 50% of total genes present in all chromosomes.

samples

Specify the sample names (overwriting the BigBang default)

samples.cex

Specify the character size for ploting the sample names.

nbf

If type=``fitnessboxes'', nbf specifies the divisor of the number of boxes in the plot. Defaults to 1.

net.th

If type=genenetwork'', it specifies the connections to plot. net.th < 1 specifies to plot connections whose distance <= net.th. net.th >= 1 specifies to plot the highest net.th connections for each node. Default is 2.

net.method

If type=genenetwork'', it specifies the method to compute the coordinates. Methods are c("isoMDS","cmdscale","sammon").

node.size

If type=genenetwork'', it specifies the size of the node.

node.name

If type=genenetwork'', it specifies the naming scheme, which can be c("index","rownames").

node.namecol

If type=genenetwork'', it specifies the color of the node names.

main,xlab, ylab,xlim,ylim,cex,pch

BigBang defaults for common plot parameters. Their usage overwrite the default value.

...

Other plot parameters (not always passed to subsequent routines).

Value

Returns nothing.

References

Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675

Examples

Run this code

# NOT RUN {
   cr <- Chromosome(genes=newCollection(Gene(shape1=1, shape2=100),5))
   ni <- Niche(chromosomes=newRandomCollection(cr, 10))
   wo <- World(niches=newRandomCollection(ni,2))
   ga <- Galgo(populations=newRandomCollection(wo,1), goalFitness = 0.75,
               callBackFunc=plot,
               fitnessFunc=function(chr, parent) 5/sd(as.numeric(chr)))
 
   #evolve(ga) ## not needed here

   bb <- BigBang(galgo=ga, maxSolutions=10, maxBigBangs=10, saveGeneBreaks=1:100)
   blast(bb)
   plot(bb)
   plot(bb, type=c("fitness","genefrequency"))
   plot(bb, type="generations")
   
# }
# NOT RUN {
 
# }