isa.unique: Filter out biclusters that are very similar to each other

Description

From a potentially non-unique set of ISA biclusters, create a unique set by removing all biclusters that are similar to others.

Usage

"isa.unique" (normed.data, isaresult, ...)

Arguments

normed.data

The normalized input data, a list of two matrices, usually the output of isa.normalize.

isaresult

The result of an ISA run, a set of biclusters.

...

Additional arguments, see details below.

Value

A named list, the filtered isaresult. See the return value of isa.iterate for the details.

Details

This function can we called as

    isa.unique(normed.data, isaresult, method = c("cor"),
               ignore.div = TRUE, cor.limit = 0.9,
	       neg.cor = TRUE, drop.zero = TRUE)

where the arguments are:

normed.data: The normalized input data, a list of two matrices, usually the output of isa.normalize.
isaresult: The result of an ISA run, a set of biclusters.
method: Character scalar giving the method to be used to determine if two biclusters are similar. Right now only ‘cor’ is implemented, this keeps both biclusters if their Pearson correlation is less than cor.limit, both for their row and column scores. See also the neg.cor argument.
ignore.div: Logical scalar, if TRUE, then the divergent biclusters will be removed.
cor.limit: Numeric scalar, giving the correlation limit for the ‘cor’ method.
neg.cor: Logical scalar, if TRUE, then the ‘cor’ method considers the absolute value of the correlation.
drop.zero: Logical scalar, whether to drop biclusters that have all zero scores.

Because of the nature of the ISA algorithm, the set of biclusters created by isa.iterate is not unique; many input seeds may converge to the same biclusters, even if the input seeds are not random.

isa.unique filters a set of biclusters and removed the ones that are very similar to ones that were already found for another seed.

References

Bergmann S, Ihmels J, Barkai N: Iterative signature algorithm for the analysis of large-scale gene expression data Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Mar;67(3 Pt 1):031902. Epub 2003 Mar 11. Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network Nat Genet. 2002 Aug;31(4):370-7. Epub 2002 Jul 22

Ihmels J, Bergmann S, Barkai N: Defining transcription modules using large-scale gene expression data Bioinformatics 2004 Sep 1;20(13):1993-2003. Epub 2004 Mar 25.

Examples

Run this code

## Create an ISA module set
set.seed(1)
insili <- isa.in.silico(noise=0.01)

## Random seeds
seeds <- generate.seeds(length=nrow(insili[[1]]), count=20)

## Normalize input matrix
nm <- isa.normalize(insili[[1]])

## Do ISA
isares <- isa.iterate(nm, row.seeds=seeds, thr.row=2, thr.col=1)

## Check correlation among modules
cc <- cor(isares$rows)
if (interactive()) { hist(cc[lower.tri(cc)],10) }

## Some of them are quite high, how many?
undiag <- function(x) { diag(x) <- 0; x }
sum(undiag(cc) > 0.99, na.rm=TRUE)

## Eliminate duplicated modules
isares.unique <- isa.unique(nm, isares)

## How many modules left?
ncol(isares.unique$rows)

## Double check
cc2 <- cor(isares.unique$rows)
if (interactive()) { hist(cc2[lower.tri(cc2)],10) }

## High correlation?
sum(undiag(cc2) > 0.99, na.rm=TRUE)

Run the code above in your browser using DataLab