vsn.old: Variance stabilization and calibration for microarray data.

Description

Robust estimation of variance-stabilizing and calibrating transformations for microarray data. This function has been superseded by vsn2. The function vsn remains in the package for backward compatibility, but for new projects, please use vsn2.

Usage

vsn(intensities, lts.quantile = 0.5, verbose      = interactive(), niter        = 10, cvg.check    = NULL, describe.preprocessing = TRUE, subsample, pstart, strata)

Arguments

intensities

An object that contains intensity values from a microarray experiment. The intensities are assumed to be the raw scanner data, summarized over the spots by an image analysis program, and possibly "background subtracted". The intensities must not be logarithmically or otherwise transformed, and not thresholded or "floored". NAs are not accepted. See details.

lts.quantile

Numeric. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1. A value of 1 corresponds to ordinary least sum of squares regression.

verbose

Logical. If TRUE, some messages are printed.

niter

Integer. The number of iterations to be used in the least trimmed sum of squares regression.

cvg.check

List. If non-NULL, this allows finer control of the iterative least trimmed sum of squares regression. See details.

pstart

Array. If not missing, user can specify start values for the iterative parameter estimation algorithm. See vsnh for details.

describe.preprocessing

Logical. If TRUE, calibration and transformation parameters, plus some other information are stored in the preprocessing slot of the returned object. See details.

subsample

Integer. If specified, the model parameters are estimated from a subsample of the data only, the transformation is then applied to all data. This can be useful for performance reasons.

strata

Integer vector. Its length must be the same as nrow(intensities). This parameter allows for the calibration and error model parameters to be stratified within each array, e.g to take into account probe sequence properties, print-tip or plate effects. If strata is not specified, one pair of parameters is fitted for every sample (i.e. for every column of intensities). If strata is specified, a pair of parameters is fitted for every stratum within every sample. The strata are coded for by the different integer values. The integer vector strata can be obtained from a factor fac through as.integer(fac), from a character vector str through as.integer(factor(fac)).

Value

ExpressionSet. Differences between the columns of the transformed intensities are "generalized log-ratios", which are shrinkage estimators of the natural logarithm of the fold change. For the transformation parameters, please see the Details.

Details

Overview: The function calibrates for sample-to-sample variations through shifting and scaling, and transforms the intensities to a scale where the variance is approximately independent of the mean intensity. The variance stabilizing transformation is equivalent to the natural logarithm in the high-intensity range, and to a linear transformation in the low-intensity range. In an intermediate range, the arsinh function interpolates smoothly between the two. For details on the transformation, please see the help page for vsnh. The parameters are estimated through a robust variant of maximum likelihood. This assumes that for the majority of genes the expression levels are not much different across the samples, i.e., that only a minority of genes (less than a fraction 1-lts.quantile) is differentially expressed.

Even if most genes on an array are differentially expressed, it may still be possible to use the estimator: if a set of non-differentially expressed genes is known, e.g. because they are external controls or reliable 'house-keeping genes', the transformation parameters can be fitted with vsn from the data of these genes, then the transformation can be applied to all data with vsnh.

Format: The format of the matrix of intensities is as follows: for the two-color printed array technology, each row corresponds to one spot, and the columns to the different arrays and wave-lengths (usually red and green, but could be any number). For example, if there are 10 arrays, the matrix would have 20 columns, columns 1...10 containing the green intensities, and 11...20 the red ones. In fact, the ordering of the columns does not matter to vsn, but it is your responsibility to keep track of it for subsequent analyses. For one-color arrays, each row corresponds to a probe, and each column to an array.

Performance: This function is slow. That is due to the nested iteration loops of the numerical optimization of the likelihood function and the heuristic that identifies the non-outlying data points in the least trimmed squares regression. For large arrays with many tens of thousands of probes, you may want to consider random subsetting: that is, only use a subset of the e.g. 10-20,000 rows of the data matrix intensities to fit the parameters, then apply the transformation to all the data, using vsnh. An example for this can be seen in the function normalize.AffyBatch.vsn, whose code you can inspect by typing normalize.AffyBatch.vsn on the R command line.

Iteration control: By default, if cvg.check is NULL, the function will run the fixed number niter of iterations in the least trimmed sum of squares regression. More fine-grained control can be obtained by passing a list with elements eps and n. If the maximum change between transformed data values is smaller than eps for n subsequent iterations, then the iteration terminates.

Estimated transformation parameters: If describe.preprocessing is TRUE, the transformation parameters are returned in the preprocessing slot of the experimentData slot of the resulting ExpressionSet object, in the form of a list with three elements

vsnParams: the parameter array (see vsnh for details)
vsnParamsIter: an array with dimensions c(dim(vsnParams, niter)) that contains the parameter trajectory during the iterative fit process (see also vsnPlotPar).
vsnTrimSelection: a logical vector that for each row of the intensities matrix reports whether it was below (TRUE) or above (FALSE) the trimming threshold.

If intensities has class ExpressionSet, and its experimentData slot has class MIAME, then this list is appended to any existing entries in the preprocessing slot. Otherwise, the experimentData object and its preprocessing slot are created.

References

Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.

Examples

Run this code

data(kidney)
log.na = function(x) log(ifelse(x>0, x, NA))

plot(log.na(exprs(kidney)), pch=".", main="log-log")

vsnkid = vsn(kidney)   ## transform and calibrate
plot(exprs(vsnkid), pch=".", main="h-h")
meanSdPlot(vsnkid)

## this should always hold true
params = preproc(description(vsnkid))$vsnParams
stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid)))

Run the code above in your browser using DataLab