show: quantro

Description

This is a function that tests for global differences between groups of distributions which asses whether global normalization methods such as quantile normalization should be applied. This function defines the quantro class and constructor.

Usage

quantro(object, groupFactor = NULL, B = 0, qRange = NULL,
  useMedianNormalized = TRUE, verbose = TRUE)

Arguments

object

an object which is inherited from an eSet such as an ExpressionSet or MethylSet object. The object can also be a data frame or matrix with observations (e.g. probes or genes) on the rows and samples as the columns.

groupFactor

a group level factor associated with each sample or column in the object. The order of the groupFactor must match the order of the columns in object.

number of permutations to assess statistical significance in a permutation test. Default B=0.

qRange

the range of quantiles to consider. Default is seq(0, 1, length.out = nrow(object)).

useMedianNormalized

TRUE/FALSE argument specifying if the median normalized data should be used or not as input to test for global differences between distributions. Default is TRUE.

verbose

TRUE/FALSE argument specifying if verbose messages should be returned or not. Default is TRUE.

Value

A quantro S4 class object
summaryReturns a list of three elements related to a summary of the experiment: (1) the number of groups (nGroups), (2) total number of samples (nTotSamples), (3) number of samples in each group (nSamplesinGroups).
BNumber of permutations for permutation testing.
anovaANOVA to test if the medians of the distributions (averaged across groups) are different across groups.
quantroStatA test statistic which is a ratio of the mean squared error between groups of distributions to the mean squared error within groups of distributions (psuedo F-statistic).
quantroStatPermIf B is not equal to 0, then a permutation test was performed to assess the statistical significance of quantroStat. These are the test statistics resulting from the permuted samples.
quantroPvalPermIf B is not equal to 0, then this is the p-value associated with the proportion of times the test statistics resulting from the permuted samples were larger than quantroStat.

Details

Quantile normalization is one of the most widely used normalization tools for data analysis in genomics. Although it was originally developed for gene expression microarrays it is now used across many different high-throughput applications including RNAseq and ChIPseq. The methodology relies on the assumption that observed changes in the empirical distribution of samples are due to unwanted variability. Because the data is transformed to remove these differences it has the potential to remove interesting biologically driven global variation. Therefore, applying quantile normalization, or other global normalization methods that rely on similar assumptions, may not be an appropriate depending on the type and source of variation.

This function can be used to test a priori to the data analysis whether global normalization methods such as quantile normalization should be applied. The quantro function uses the raw unprocessed high-throughput data to test for global differences in the distributions across a set of groups.

The quantro function will perform two tests:

1. An ANOVA to test if the medians of the distributions are different across groups. Differences across groups could be attributed to unwanted technical variation (such as batch effects) or real global biological variation. This is a helpful step for the user to verify if there is some unaccounted technical variation.

2. A test for global differences between the distributions across groups. The main output is a test statistic called quantroStat. This test statistic is a ratio of two variances and is similar to the idea of ANOVA. The main idea of the test is to compare the variability of distributions within the groups to the variability of distributions between the groups. If the variance between the groups is sufficiently larger than the variance within the groups, quantile normalization may not be an appropriate normalization technique depending on the source of variation (technical or biological variation). As a default, we perform this test on after a median normalization, but this option may be changed.

To assess the statistical significance of quantroStat, we use permutation testing. To perform a permutation test, set B to the number of permutations which will create a null distribution. If the number of samples is large, this number can be a large number such as 1000. This step can be very slow, but a parallelization has been implemented throught the foreach package. Register the number of cores using the doParallel package.

See the vignette for more details.

Examples

Run this code

library(minfi)
data(flowSorted)
p <- getBeta(flowSorted, offset = 100)
pd <- pData(flowSorted)

qtest <- quantro(object = p, groupFactor = pd$CellType)

Run the code above in your browser using DataLab