calculatePvalues: Calculate p-values and identify regions

Description

First, this function finds the regions of interest according to specified cutoffs. Then it permutes the samples and re-calculates the F-statistics. The area of the statistics from these segments are then used to calculate p-values for the original regions.

Usage

calculatePvalues(coveragePrep, models, fstats, nPermute = 1L,
  seeds = as.integer(gsub("-", "", Sys.Date())) + seq_len(nPermute), chr,
  cutoff = quantile(fstats, 0.99, na.rm = TRUE), significantCut = c(0.05,
  0.1), lowMemDir = NULL, smooth = FALSE, weights = NULL,
  smoothFunction = bumphunter::locfitByCluster, ...)

Arguments

coveragePrep

A list with $coverageProcessed, $mclapplyIndex, and $position normally generated using preprocessCoverage.

models

A list with $mod and $mod0 normally generated using makeModels.

fstats

A numerical Rle with the F-statistics normally generated using calculateStats.

nPermute

The number of permutations. Note that for a full chromosome, a small amount (10) of permutations is sufficient. If set to 0, no permutations are performed and thus no null regions are used, however, the $regions component is created.

seeds

An integer vector of length nPermute specifying the seeds to be used for each permutation. If NULL no seeds are used.

chr

A single element character vector specifying the chromosome name. This argument is passed to findRegions.

cutoff

F-statistic cutoff to use to determine segments.

significantCut

A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the P-values, while the second element is used for the Q-values (FDR adjusted P-values).

lowMemDir

The directory where the processed chunks are saved when using preprocessCoverage with a specified lowMemDir.

smooth

Whether to smooth the F-statistics (fstats) or not. This is by default FALSE. For RNA-seq data we recommend using FALSE.

weights

Weights used by the smoother as described in smoother.

smoothFunction

A function to be used for smoothing the F-statistics. Two functions are provided by the bumphunter package: loessByCluster and runmedByCluster. If you are using your own custom function, it has to return a named list with an element called $fitted that contains the smoothed F-statistics and an element claled $smoothed that is a logical vector indicating whether the F-statistics were smoothed or not. If they are not smoothed, the original values will be used.

...

Arguments passed to other methods and/or advanced arguments.

Value

A list with four components: [object Object],[object Object],[object Object],[object Object]

Examples

Run this code

## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage), 
    verbose = TRUE)

## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull, probs=c(0.5), verbose = TRUE)

## Build the models
group <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)

## Preprocess the data
## Automatic chunksize used to then compare 1 vs 4 cores in the 'do not run'
## section
prep <- preprocessCoverage(genomeData, groupInfo = group, cutoff = 0, 
    scalefac = 32, chunksize = NULL, colsubset = NULL, mc.cores = 4)

## Get the F statistics
fstats <- genomeFstats

## We recommend determining the cutoff to use based on the F-distribution
## although you could also based it on the observed F-statistics.

## In this example we use a low cutoff used for illustrative purposes
cutoff <- 1

## Calculate the p-values and define the regions of interest.
regsWithP <- calculatePvalues(prep, models, fstats, nPermute=1, seeds=1, 
    chr = 'chr21', cutoff = cutoff, mc.cores = 1, method = 'regular')
regsWithP

## Calculate again, but with 10 permutations instead of just 1
regsWithP <- calculatePvalues(prep, models, fstats, nPermute=10, seeds=1:10, 
    chr='chr21', cutoff=cutoff, mc.cores=2, method='regular')

## Check that they are the same as the previously calculated regions
library(testthat)
expect_that(regsWithP, equals(genomeRegions))

## Histogram of the theoretical p-values by region
hist(pf(regsWithP$regions$value, df1-df0, n-df1), main='Distribution 
    original p-values by region', freq=FALSE)

## Histogram of the permutted p-values by region
hist(regsWithP$regions$pvalues, main='Distribution permutted p-values by 
    region', freq=FALSE)

## MA style plot
library('ggplot2')
ma <- data.frame(mean=regsWithP$regions$meanCoverage, 
    log2FoldChange=regsWithP$regions$log2FoldChangeYRIvsCEU)
ggplot(ma, aes(x=log2(mean), y=log2FoldChange)) + geom_point() + 
    ylab('Fold Change (log2)') + xlab('Mean coverage (log2)') + 
    labs(title='MA style plot')

## Annotate the results
library('bumphunter')
genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene)
annotation <- matchGenes(regsWithP$regions, genes)
head(annotation)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples