BQ: Call peaks in replicated ChIP-seq data using BinQuasi

Description

Use the BinQuasi algorithm to call peaks using ChIP-seq data with biological replicates.

Usage

BQ(dir, ChIP.files, control.files, alpha = 0.05, bin.size = NULL,
  frag.length = NULL, minimum.count = 20, Model = "NegBin",
  print.progress = TRUE, method = "QLShrink", p.window.adjust = "BY",
  Dispersion = "Deviance", log.offset = NULL, NBdisp = "trend",
  bias.fold.tolerance = 1.1)

Arguments

dir

Directory where the sorted bam files (and their corresponding bam indices) are saved.

ChIP.files

File names (with file extensions) of the ChIP sample files in sorted bam format.

control.files

File names (with file extensions) of the control/input sample files in sorted bam format.

alpha

The desired significance threshold used to call peaks. Must be in (0, 0.5).

bin.size

Window size (constant across all samples) used to generate a partition for counts. If NULL, it will be estimated based on Shimazaki and Shinomoto (2007).

frag.length

Average length of the ChIP fragments in each sample provided. Reads are extended to this length in the 5'-to-3' direction. If NULL, cross correlation will be used to estimate the fragment

minimum.count

The count threshold used for filtering out windows with sparse counts. Any genomic window with a total count, across all samples, less than this value will be removed.

Model

Must be one of "Poisson" or "NegBin", specifying use of a quasi-Poisson or quasi-negative binomial model, respectively.

print.progress

logical. If TRUE, updates are provided regarding which window (row number) is being analyzed. Updates occur frequently to start then eventually occur every 5000 windows.

method

Must be one of "QL", "QLShrink", or "QLSpline", specifying which method of Lund, Nettleton, McCarthy and Smyth (2012) should be used to compute p-values.

p.window.adjust

FDR control method applied to the windows. Must be either "BH" or "BY" to specify the procedure of Benjamini-Hochberg or Benjamini-Yekutieli, respectively.

Dispersion

Must be one of "Deviance" or "Pearson", specifying which type of estimator should be used for estimating the quasi-likelihood dispersion parameters.

log.offset

A vector of log-scale, additive factors used to adjust estimated log-scale means for differences in library sizes across samples. Commonly used offsets include log.offset=log(colSums(counts)) and log.offset=log(apply(counts[rowSums(counts)!=0,],2,quantile,.75)). If NULL, the later offset is used.

NBdisp

Used only when Model="NegBin". Must be one of "trend", "common", or a vector of non-negative real numbers with length equal to nrow(counts). Specifying NBdisp="trend" or NBdisp="common" will use estimateGLMTrendedDisp or estimateGLMCommonDisp, respectively, from the package edgeR to estimate negative binomial dispersion parameters for each window. Estimates obtained from other sources can be used by entering NBdisp as a vector containing the negative binomial dispersion value to use for each window when fitting the quasi-likelihood model.

bias.fold.tolerance

A numerical value no smaller than 1. If the bias reduction of maximum likelihood estimates of (log) fold change is likely to result in a ratio of fold changes greater than this value, then bias reduction will be performed on such windows. Setting bias.fold.tolerance=Inf will completely disable bias reduction; setting bias.fold.tolerance=1 will always perform bias reduction. See NBDev or PoisDev for details.

Value

A list containing:

peaks

Dataframe of the called peaks with columns for the start and end location, width, chromosome, p-value, and q-value computed using the Benjamini and Hochberg method.

bin.size

The window width used to create the counts dataframe.

fragment.length

Vector of the fragment lengths used to extend the reads in each sample.

filter

The count threshold used to create the counts dataframe. Windows with counts below this value were removed.

Details

This function calls peaks in replicated ChIP-seq data using the BinQuasi algorithm of Goren, Liu, Wang, and Wang.

References

Goren, Liu, Wang and Wang (2018) "BinQuasi: a peak detection method for ChIP-sequencing data with biological replicates" Bioinformatics.

Shimazaki and Shinomoto (2007) "A method for selecting the bin size of a time histogram" Neural computation, 19(6), 1503-27.

Ramachandran, Palidwor, Porter, and Perkins (2013) "MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data" Bioinformatics 29(4), 444-50.

Benjamini and Hochberg (1995) "Controlling the false discovery rate: a practical and powerful approach to multiple testing" Journal of the Royal Statistical Society Series B, 57: 289-300.

Benjamini and Yekutieli (2001) "The control of the false discovery rate in multiple testing under dependency" Annals of Statistics. 29: 1165-1188.

Lund, Nettleton, McCarthy and Smyth (2012) "Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates" SAGMB, 11(5).

Examples

Run this code

# NOT RUN {
# Fit a quasi-negative binomial model using all default settings.
fpath <- paste0(system.file(package = 'BinQuasi'), '/extdata/')
fpath
results <- BQ(fpath, ChIP.files = c('C1.bam', 'C2.bam'), control.files = c('I1.bam', 'I2.bam'))
head(results$peaks)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab