calculatePvalues: Calculate p-values and identify regions

Description

First, this function finds the regions of interest according to specified cutoffs. Then it permutes the samples and re-calculates the F-statistics. The area of the statistics from these segments are then used to calculate p-values for the original regions.

Usage

calculatePvalues(coveragePrep, models, fstats, nPermute = 1L, seeds = as.integer(gsub("-", "", Sys.Date())) + seq_len(nPermute), chr, cutoff = quantile(fstats, 0.99), significantCut = c(0.05, 0.1), lowMemDir = NULL, ...)

Arguments

coveragePrep

A list with $coverageProcessed, $mclapplyIndex, and $position normally generated using preprocessCoverage.

models

A list with $mod and $mod0 normally generated using makeModels.

fstats

A numerical Rle with the F-statistics normally generated using calculateStats.

nPermute

The number of permutations. Note that for a full chromosome, a small amount (10) of permutations is sufficient. If set to 0, no permutations are performed and thus no null regions are used, however, the $regions component is created.

seeds

An integer vector of length nPermute specifying the seeds to be used for each permutation. If NULL no seeds are used.

chr

A single element character vector specifying the chromosome name. This argument is passed to findRegions.

cutoff

F-statistic cutoff to use to determine segments.

significantCut

A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the p-values and the second element is used for the q-values.

lowMemDir

The directory where the processed chunks are saved when using preprocessCoverage with a specified lowMemDir.

...

Arguments passed to other methods and/or advanced arguments.

Value

A list with four components:

regions: is a GRanges with metadata columns given by findRegions with the additional metadata column pvalues: p-value of the region calculated via permutations of the samples; qvalues: the qvalues calculated using qvalue; significant: whether the p-value is less than 0.05 (by default); significantQval: whether the q-value is less than 0.10 (by default). It also includes the mean coverage of the region (mean from the mean coverage at each base calculated in preprocessCoverage). Furthermore, if groupInfo was not NULL in preprocessCoverage, then the group mean coverage is calculated as well as the log 2 fold change (using group 1 as the reference).
nullStats: is a numeric Rle with the mean of the null statistics by segment.
nullWidths: is a numeric Rle with the length of each of the segments in the null distribution. The area can be obtained by multiplying the absolute nullstats by the corresponding lengths.
nullPermutation: is a Rle with the permutation number from which the null region originated from.

Examples

Run this code

## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
    verbose = TRUE)

## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull, probs=c(0.5), verbose = TRUE)

## Build the models
group <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)

## Preprocess the data
## Automatic chunksize used to then compare 1 vs 4 cores in the 'do not run'
## section
prep <- preprocessCoverage(genomeData, groupInfo = group, cutoff = 0,
    scalefac = 32, chunksize = NULL, colsubset = NULL, mc.cores = 4)

## Get the F statistics
fstats <- genomeFstats

## We recommend determining the cutoff to use based on the F-distribution
## although you could also based it on the observed F-statistics.

## In this example we use a low cutoff used for illustrative purposes
cutoff <- 1

## Calculate the p-values and define the regions of interest.
regsWithP <- calculatePvalues(prep, models, fstats, nPermute=1, seeds=1,
    chr = 'chr21', cutoff = cutoff, mc.cores = 1, method = 'regular')
regsWithP

## Not run: 
# ## Calculate again, but with 10 permutations instead of just 1
# regsWithP <- calculatePvalues(prep, models, fstats, nPermute=10, seeds=1:10,
#     chr='chr21', cutoff=cutoff, mc.cores=2, method='regular')
# 
# ## Check that they are the same as the previously calculated regions
# library(testthat)
# expect_that(regsWithP, equals(genomeRegions))
# 
# ## Histogram of the theoretical p-values by region
# hist(pf(regsWithP$regions$value, df1-df0, n-df1), main='Distribution
#     original p-values by region', freq=FALSE)
# 
# ## Histogram of the permutted p-values by region
# hist(regsWithP$regions$pvalues, main='Distribution permutted p-values by
#     region', freq=FALSE)
# 
# ## MA style plot
# library('ggplot2')
# ma <- data.frame(mean=regsWithP$regions$meanCoverage,
#     log2FoldChange=regsWithP$regions$log2FoldChangeYRIvsCEU)
# ggplot(ma, aes(x=log2(mean), y=log2FoldChange)) + geom_point() +
#     ylab('Fold Change (log2)') + xlab('Mean coverage (log2)') +
#     labs(title='MA style plot')
# 
# ## Annotate the results
# library('bumphunter')
# annotation <- annotateNearest(regsWithP$regions, 'hg19')
# head(annotation)
# 
# ## End(Not run)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples