Learn R Programming

PAA (version 1.7.1)

preselect: Score and preselect features.

Description

Iterates all features to score them via mMs, Student's t-test, or mRMR. Optionally, a list of not informative features can be obtained (for discarding them).

Usage

preselect(elist=NULL, columns1=NULL, columns2=NULL, label1="A", label2="B", log=NULL, discard.threshold=0.5, fold.thresh=1.5, discard.features=TRUE, mMs.above=1500, mMs.between=400, mMs.matrix1=NULL, mMs.matrix2=NULL, method=NULL)

Arguments

elist
EListRaw or EList object (mandatory).
columns1
column name vector (string vector) of group 1 (mandatory).
columns2
column name vector (string vector) of group 2 (mandatory).
label1
class label of group 1.
label2
class label of group 2.
log
indicates whether the data is in log scale (mandatory; note: if TRUE log2 scale is expected).
discard.threshold
positive numeric between 0 and 1 indicating the maximum mMs or, respectively, the maximum t-test p-value for features to be included for further analysis. Default is "0.5".
fold.thresh
numeric indicating the minimum fold change for features to be included for further analysis. Default is "1.5".
discard.features
boolean indicating whether merely feature scores (i.e., mMs or t-test p-values) (="FALSE") or feature scores and a discard list (="TRUE") should be returned. Default is "TRUE".
mMs.above
mMs above parameter (integer). Default is "1500".
mMs.between
mMs between parameter (integer). Default is "400".
mMs.matrix1
precomputed mMs reference matrix (see mMsMatrix()) for group 1 (mandatory).
mMs.matrix2
precomputed mMs reference matrix (see mMsMatrix()) for group 2 (mandatory).
method
preselection method ( "mMs", "tTest", "mrmr"). Default is "mMs".

Value

If discard.features is "FALSE": matrix containing metadata, feature scores and intensity values for the whole data set.If discard.features is "TRUE", a list containing:
results
matrix containing metadata, feature scores and intensity values for the whole data set.
discard
vector containing row indices (= features) for discarding features considered as not differential.

Details

This function takes an EListRaw or EList object and group-specific column vectors. Furthermore, the class labels of group 1 and group 2 are needed. If discard.features is "TRUE" (default), all features that are considered as not differential will be collected and returned for discarding.

If method = "mMs", additionally precomputed mMs reference matrices (see mMsMatrix()) for group 1 and group 2 will be needed to compute mMs values (see Love B.) as scoring method. All mMs parameters (mMs.above and mMs.between) can be set. The defaults are "1500" for mMs.above and "400" for mMs.between. Features having an mMs value larger than discard.threshold (here: numeric between 0.0 and 1.0) or do not satisfy the minimal absolute fold change fold.thresh are considered as not differential.

If method = "tTest", Student's t-test will be used as scoring method. Features having a p-value larger than discard.threshold (here: numeric between 0.0 and 1.0) or do not satisfy the minimal absolute fold change fold.thresh are considered as not differential.

If method = "mrmr", mRMR scores for all features will be computed as scoring method (using the function mRMR.classic() of the CRAN R package mRMRe). Features that are not the discard.threshold (here: integer indicating a number of features) best features regarding their mRMR score are considered as not differential.

References

Love B: The Analysis of Protein Arrays. In: Functional Protein Microarrays in Drug Discovery. CRC Press; 2007: 381-402.

The software "Prospector" for ProtoArray analysis can be downloaded from the Thermo Fisher Scientific web page (https://www.thermofisher.com).

The R package mRMRe can be downloaded from CRAN. See also: De Jay N, Papillon-Cavanagh S, Olsen C, El-Hachem N, Bontempi G, Haibe-Kains B. mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics 2013.

The package limma by Gordon Smyth et al. can be downloaded from Bioconductor (https://www.bioconductor.org).

Smyth, G. K. (2005). Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds.), Springer, New York, pages 397-420.

Examples

Run this code
cwd <- system.file(package="PAA")
load(paste(cwd, "/extdata/Alzheimer.RData", sep=""))
elist <- elist[elist$genes$Block < 10,]
c1 <- paste(rep("AD",20), 1:20, sep="")
c2 <- paste(rep("NDC",20), 1:20, sep="")
preselect(elist, columns1=c1, columns2=c2, label1="AD", label2="NDC", log=FALSE,
 discard.threshold=0.5, fold.thresh=1.5, discard.features=TRUE, method="tTest")

Run the code above in your browser using DataLab