testRobustToNAimputation: Pairwise Testing Robust To NA-Imputation

Description

This function replaces NA values based on group neighbours (based on grouping of columns in argument gr), following overall assumption of close to Gaussian distribution. Furthermore, it is assumed that NA-values originate from experimental settings where measurements at or below detection limit are recoreded as NA. In such cases (eg in proteomics) it is current practice to replace NA-values by very low (random) values in order to be able to perform t-tests. However, random normal values used for replacing may in rare cases deviate from the average (the 'assumed' value) and in particular, if multiple NA replacements are above the average, may look like induced biological data and be misinterpreted as so.

Usage

testRobustToNAimputation(
  dat,
  gr = NULL,
  useComparison = NULL,
  annot = NULL,
  retnNA = TRUE,
  avSd = c(0.15, 0.5),
  avSdH = NULL,
  plotHist = FALSE,
  xLab = NULL,
  tit = NULL,
  imputMethod = "mode2",
  seedNo = NULL,
  multCorMeth = c("lfdr", "FDR"),
  nLoop = 100,
  lfdrInclude = NULL,
  ROTSn = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a limma-type MA-object ('MArrayLM', can be handeled like a list). For details 'on choice of NA-impuation procedures with arguments 'imputMethod' and 'avSd' please see matrixNAneighbourImpute.

This function returns a limma-type S3 object of class 'MArrayLM' (which can be accessed lika a list); multiple results of testing or multiple testing correction types may get included ('p.value','FDR','BY','lfdr' or 'ROTS.BH')

Arguments

dat: (matrix or data.frame) main data (may contain NA); if dat is list containing $quant and $annot as matrix, the element $quant will be used
gr: (character or factor) replicate association; if dat contains a list-element $sampleSetup$groups or $sampleSetup$lev this may be used in case gr=NULL
useComparison: (list, character or matrix) optional argument allowing to specify which pairwise comparions sould be performed, default useComparison=NULL will run all pairwise comparisons; may be character combining two group-names (from argument grp) separated by a '-' (eg 'A-B') or matrix where the rownames design the elements to be compared as pairwise; It is also possible to give a list with $sep (separator to be used when combining) and $useComparison (regular comparisons) Note : the names of the groups may not contain any '-' to avoid confucing them with pairwise separators !
annot: (matrix or data.frame) annotation (lines must match lines of data !), if annot is NULL and argument dat is a list containing both $quant and $annot, the element $annot will be used
retnNA: (logical) retain and report number of NA
avSd: (numerical,length=2) population characteristics (mean and sd) for >1 NA-neighbours (per line)
avSdH: deprecated, please use avSd inestad; (numerical,length=2) population characteristics 'high' (mean and sd) for >1 NA-neighbours (per line)
plotHist: (logical) additional histogram of original, imputed and resultant distribution (made using matrixNAneighbourImpute )
xLab: (character) custom x-axis label
tit: (character) custom title
imputMethod: (character) choose the imputation method (may be 'mode2'(default), 'mode1', 'datQuant', 'modeAdopt', 'informed' or 'none', for details see matrixNAneighbourImpute )
seedNo: (integer) seed-value for normal random values
multCorMeth: (character) define which method(s) for correction of multipl testing should be run (for choice : 'BH','lfdr','BY','tValTab', choosing several is possible)
nLoop: (integer) number of runs of independent NA-imputation
lfdrInclude: (logical) deprecated, please used multCorMeth instead (include lfdr estimations, may cause warning message(s) concerning convergence if few too lines/proteins in dataset tested).
ROTSn: (integer) deprecated, please used multCorMeth instead (number of repeats by ROTS, if NULL ROTS will not be called)
silent: (logical) suppress messages
debug: (logical) additional messages for debugging
callFrom: (character) This function allows easier tracking of messages produced

Details

The statistical testing uses eBayes from Bioconductor package limma for robust testing in the context of small numbers of replicates. By repeating multiple times the process of replacing NA-values and subsequent testing the results can be sumarized afterwards by median over all repeated runs to remmove the stochastic effect of individual NA-imputation. Thus, one may gain stability towards random-character of NA imputations by repeating imputation & test 'nLoop' times and summarize p-values by median (results stabilized at 50-100 rounds). It is necessary to define all groups of replicates in gr to obtain all possible pair-wise testing (multiple columns in $BH, $lfdr etc). The modified testing-procedure of Bioconductor package ROTS may optionaly be included, if desired. This function returns a limma-like S3 list-object further enriched by additional fields/elements.

The argument multCorMeth allows to choose which multiple correction algorimths will be used and included to the final results. Possible options are 'lfdr','BH','BY','tValTab', ROTSn='100' (name to element necessary) or 'noLimma' (to add initial p.values and BH to limma-results). By default 'lfdr' (local false discovery rate from package 'fdrtools') and 'BH' (Benjamini-Hochberg FDR) are chosen. The option 'BY' referrs to Benjamini-Yakuteli FDR, 'tValTab' allows exporting all individual t-values from the repeated NA-substitution and subsequent testing.

This function is compatible with automatic extraction of experimental setup based on sdrf or other quantitation-specific sample annotation. In this case, the results of automated importing and mining of sample annotation should be stored as $sampleSetup$groups or $sampleSetup$lev

It is possible to limit the pairwise combinations to a custom designed set using the argument useComparison. This may be a matrix where very line designs a new pairwise comparison and the first column is refers to 'sample' while the second column assigns the 'reference'. Otheriwse, one may provide a character vector as useComparison where each entry referes to a new pairwise comparison. In this case it is recommended to use '--' as separator for combining the group-names to be used as 'sample' and 'reference'. In case the names of groups (argument grp) do not contain any '-' the single character separator '-' may also be used in compatibility with use in bioconductor packae limma. However, if grp) does contain any '-', results will be presented using '--' as separator to prevent any confusion.

Examples

Run this code

set.seed(2015); rand1 <- round(runif(600) +rnorm(600,1,2),3)
dat1 <- matrix(rand1,ncol=6) + matrix(rep((1:100)/20,6),ncol=6)
dat1[13:16,1:3] <- dat1[13:16,1:3] +2      # augment lines 13:16 
dat1[19:20,1:3] <- dat1[19:20,1:3] +3      # augment lines 19:20
dat1[15:18,4:6] <- dat1[15:18,4:6] +1.4    # augment lines 15:18 
dat1[dat1 <1] <- NA                        # mimick some NAs for low abundance
## normalize data
boxplot(dat1, main="Data Before Normalization", las=1)
dat1 <- wrMisc::normalizeThis(as.matrix(dat1), meth="median")
## designate replicate relationships in samples ...  
grp1 <- gl(2, 3, labels=LETTERS[1:2])                   
## moderated t-test with repeated imputations (may take >10 sec, >60 sec if ROTSn >0 !) 
PLtestR1 <- testRobustToNAimputation(dat=dat1, gr=grp1, retnNA=TRUE, nLoop=20)
names(PLtestR1)
head(PLtestR1$p.value)
head(PLtestR1$BH)
head(PLtestR1$means)
boxplot(PLtestR1$datImp, main="Data At Normalization & Imputation", las=1)

## custom selection of comparisons (incl custom orientation)
useComp <- c("A-B", "B-A")      # You can choose orientation sample/reference
PLtestR2 <- testRobustToNAimputation(dat=dat1, gr=grp1, useComparison=useComp, 
  retnNA=TRUE, nLoop=20)
head(PLtestR2$BH)
head(PLtestR2$means)

Run the code above in your browser using DataLab