quantileDiscretize: Discretize expression matrix for qualitative biclustering

Description

Performs recursive quantilizations on gene expression data across samples, to quantileDiscretize gene expression matrix. The quantile parameter q determines the estimated proportion of differentially expressed genes (2q as for both up- and down-regulatons). The rank parameter r determines how many discrete levels should differentially expressed genes (or outliers) have. See details below.

Usage

quantileDiscretize(x, ...)

Arguments

It can be an object of the eSet class or inheriting it. The most commonly used form is an linkS4class{ExpressionSet} class. Alternatively, it can be a numeric matrix.

...

Currently, the ...accepts two parameter: q and rank, explained below.

rank

Ranks (levels) of outliers, a positive integer, default is 1L. By default, all conditions get one label for each gene in ${-1, 0, 1}$, representing down expression, not changing and high expression respectively. In case $rank>1$, the outliers are further divided into rank levels by applying recursive quantilization with equal intervals.

Value

An object of the same class as the input parameter, with the exprs slot replaced by the quantileDiscretized matrix, which is a matrix of integer.

Details

Parameter q corresponds to the command line option -q in the QUBIC command line tool, and the rank option corresponds to -r.

For each gene, the algorithm applies quantile discretization first to divide conditions into negative (lower), un-changed and positive (higher) expressions. Negative and positive expressed conditions are considered as outliers. For outliers in each direction, the algorithm tries to further quantileDiscretize the expression values in case $rank>1$.

This second discretization step is performed by dividing the sorted outliers into $rank$ tandom groups with equal conditions. A label is assigned to each of these tandom groups, in the following order: $$-1, -2, \ldots, -rank$$ for outliers with negative expression, from the most negative group to the least negative group (not the other way around!).

Similarly, for positive outliers, labels in the order of $$rank, rank-1, \ldots, 1$$ are assigned to tandom groups from the least positive group to the most positive group.

That is, signs of labels indicate the direction of gene expression change, and the absolute value represents the quantileDiscretized rank in the outliers.

References

Li et al. (2009) QUBIC: a qualitative biclustering algorithm for analyses of gene expression data Nucleic Acids Research 37:e101

Examples

Run this code

library(Biobase)
data(sample.ExpressionSet, package="Biobase")
sample.disc <- quantileDiscretize(sample.ExpressionSet)
exprs(sample.disc)[1:6, 1:6]

## Equivalent to pass a numeric matrix
sample.mat.disc <- quantileDiscretize(exprs(sample.ExpressionSet))
sample.mat.disc[1:6, 1:6]
identical(exprs(sample.disc),sample.mat.disc)

## with multiple ranks
sample.rank3 <- quantileDiscretize(sample.ExpressionSet, rank=3)
exprs(sample.rank3)[1:6, 1:6]

Run the code above in your browser using DataLab