dyebias.estimate.iGSDBs(data.norm, is.balanced=TRUE, reference="ref",
verbose=FALSE)marrayNorm object containing the data for estimating the
dye bias. This object is supposed to be complete. In particular,
maLabels(maGnames(data.norm)) must be set and must indicate the
identities of the reporter sequence (i.e., oligo or cDNA
sequence) of each spot. This helps identify replicate spots, which
are averaged as part of the estimation. If the data is unbalanced (so is.balanced is FALSE),
maInfo(maTargets(data.norm)) is also required, and should
contain at least two attributes: Cy5 and Cy3. Both
should indicate the factor value for the respective channel.
TRUE will become illegal in
the future. Logical indicating whether the data set represents a balanced design
(which is by far the most common case). A design is balanced if all
factor values are present an equal number of times in both the
forward and reverse dye orientations. A self-self design is by
definition balanced (even if the number of slides is uneven). If
is.balanced is TRUE, the iGSDB estimate is
obtained by simply averaging, per reporter, all $M$ values
(and the value of the reference argument is ignored).
If is.balanced==FALSE, the design is inferred from the
reference argument, and subsequently the limma
package is used to model the dye effect. This is typically done for
an unbalanced data set, but there is no harm in setting
is.balanced=FALSE for a design that by itself is already
balanced. If there are no missing values in the data, the results of
using the simple average and the limma procedure are identical
(although LIMMA takes longer to compute the iGSDBs). If the data set
contains many missing data points (NA's), the limma estimates differ
slightly from the simple averaged estimates (although it is not
clear which ones are better).
reference argument should be this common reference (which may not be
empty). If the design
contains multiple common references, reference
should be a vector listing all the common references, and the name
of the factor value that is not the common reference should have its
own common reference as a prefix. E.g., if two mutant strains
mutA and mutB were assayed, each against a separate
common reference ref1 and ref2, the
reference-argument
would be c("ref1", "ref2"), and the Cy3 and Cy5
attributes of maInfo(maTargets(data.norm)) would be
values from "ref1:mutA", "ref2:mutA", "ref1:mutB",
"ref2:mutB". The colon is not important, but the prefix is, as it
allows the association of each sample with its 'own' common reference.dyebias ($H_0$:
dyebias = 0). All p-value are set to NA if
they were not estimated (i.e., if limma was not run because
is.balanced was TRUE)dyebias.apply.correction.The assumption underlying this approach is that with self-selfs, or with pairs of dye swaps, the only effect that can lead to systematic changes between Cy5 and Cy3, is in fact the dye effect.
There are two cases to distinguish, the balanced case, and the unbalanced case. In the balanced case, the iGSDB estimate is simply the average $M$ (where $M = log_2(R/G) = log_2(Cy5/Cy3)$) over all slides. A set of slides is balanced if all factor values are present in as many dye-swapped as non-dye-swapped slides. A set of self-self slides is in fact a degenerate form of this, and is therefore also balanced.
In the unbalanced case, one could omit slides until the data set is balanced. However, this is wasteful as we can use linear modelling to obtain estimates. We use the limma package for this (Smyth, 2005). The only unbalanced designs currently supported are a common reference design, and a set of common reference designs.
There are no weights or subset argument to this function; the estimation is done for all reporters found. If there are replicate spots, they are averaged prior to the estimation (the reason being that we are not interested in p-values for the estimate)
Having obtained the iGSDB estimates, the corrections can be applied
to either to the hybridizations given by the data.norm argument,
or to a different set of slides that is thought to have very similar
iGSDBs. Applying the corrections is done with
dyebias.apply.correction.
Dudoit, S. and Yang, Y.H. (2002) Bioconductor R packages for exploratory analysis and normalization of cDNA microarray data. In: Parmigiani, G., Garrett, E.S. , Irizarry, R.A., and Zeger, S.L. (eds.) The Analysis of Gene Expression Data: Methods and Software, Springer, New York. Smyth, G.K. (2005) Limma: linear models for microarray data. In: Gentleman, R., Carey, V., Dudoit, S., Irizarry, R. and Huber, W. (eds). Bioinformatics and Computational Biology Solutions using R and Bioconductor, Springer, New York.
options(stringsAsFactors = FALSE)
library(dyebias)
library(dyebiasexamples)
data(data.raw)
data(data.norm)
iGSDBs.estimated <- dyebias.estimate.iGSDBs(data.norm,
is.balanced=TRUE,
verbose=FALSE)
summary(iGSDBs.estimated)
hist(iGSDBs.estimated$dyebias, breaks=50)Run the code above in your browser using DataLab