preselect: Score and preselect features.

Description

Iterates all features to score them via mMs, Student's t-test, or mRMR. Optionally, a list of not informative features can be obtained (for discarding them).

Usage

preselect(elist=NULL, columns1=NULL, columns2=NULL, label1="A", label2="B", log=NULL, discard.threshold=0.5, fold.thresh=1.5, discard.features=TRUE, mMs.above=1500, mMs.between=400, mMs.matrix1=NULL, mMs.matrix2=NULL, method=NULL)

Arguments

elist

EListRaw or EList object (mandatory).

columns1

column name vector (string vector) of group 1 (mandatory).

columns2

column name vector (string vector) of group 2 (mandatory).

label1

class label of group 1.

label2

class label of group 2.

log

indicates whether the data is in log scale (mandatory; note: if TRUE log2 scale is expected).

discard.threshold

positive numeric between 0 and 1 indicating the maximum mMs or, respectively, the maximum t-test p-value for features to be included for further analysis. Default is "0.5".

fold.thresh

numeric indicating the minimum fold change for features to be included for further analysis. Default is "1.5".

discard.features

boolean indicating whether merely feature scores (i.e., mMs or t-test p-values) (="FALSE") or feature scores and a discard list (="TRUE") should be returned. Default is "TRUE".

mMs.above

mMs above parameter (integer). Default is "1500".

mMs.between

mMs between parameter (integer). Default is "400".

mMs.matrix1

precomputed mMs reference matrix (see mMsMatrix()) for group 1 (mandatory).

mMs.matrix2

precomputed mMs reference matrix (see mMsMatrix()) for group 2 (mandatory).

method

preselection method ( "mMs", "tTest", "mrmr"). Default is "mMs".

Value

results: matrix containing metadata, feature scores and intensity values for the whole data set.
discard: vector containing row indices (= features) for discarding features considered as not differential.

Details

This function takes an EListRaw or EList object and group-specific column vectors. Furthermore, the class labels of group 1 and group 2 are needed. If discard.features is "TRUE" (default), all features that are considered as not differential will be collected and returned for discarding.

If method = "mMs", additionally precomputed mMs reference matrices (see mMsMatrix()) for group 1 and group 2 will be needed to compute mMs values (see Love B.) as scoring method. All mMs parameters (mMs.above and mMs.between) can be set. The defaults are "1500" for mMs.above and "400" for mMs.between. Features having an mMs value larger than discard.threshold (here: numeric between 0.0 and 1.0) or do not satisfy the minimal absolute fold change fold.thresh are considered as not differential.

If method = "tTest", Student's t-test will be used as scoring method. Features having a p-value larger than discard.threshold (here: numeric between 0.0 and 1.0) or do not satisfy the minimal absolute fold change fold.thresh are considered as not differential.

If method = "mrmr", mRMR scores for all features will be computed as scoring method (using the function mRMR.classic() of the CRAN R package mRMRe). Features that are not the discard.threshold (here: integer indicating a number of features) best features regarding their mRMR score are considered as not differential.

References

Love B: The Analysis of Protein Arrays. In: Functional Protein Microarrays in Drug Discovery. CRC Press; 2007: 381-402.

The software "Prospector" for ProtoArray analysis can be downloaded from the Thermo Fisher Scientific web page (https://www.thermofisher.com).

The R package mRMRe can be downloaded from CRAN. See also: De Jay N, Papillon-Cavanagh S, Olsen C, El-Hachem N, Bontempi G, Haibe-Kains B. mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics 2013.

The package limma by Gordon Smyth et al. can be downloaded from Bioconductor (https://www.bioconductor.org).

Smyth, G. K. (2005). Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds.), Springer, New York, pages 397-420.

Examples

Run this code

cwd <- system.file(package="PAA")
load(paste(cwd, "/extdata/Alzheimer.RData", sep=""))
elist <- elist[elist$genes$Block < 10,]
c1 <- paste(rep("AD",20), 1:20, sep="")
c2 <- paste(rep("NDC",20), 1:20, sep="")
preselect(elist, columns1=c1, columns2=c2, label1="AD", label2="NDC", log=FALSE,
 discard.threshold=0.5, fold.thresh=1.5, discard.features=TRUE, method="tTest")

Run the code above in your browser using DataLab