Use the BinQuasi algorithm to call peaks using ChIP-seq data with biological replicates.
BQ(dir, ChIP.files, control.files, alpha = 0.05, bin.size = NULL,
frag.length = NULL, minimum.count = 20, Model = "NegBin",
print.progress = TRUE, method = "QLShrink", p.window.adjust = "BY",
Dispersion = "Deviance", log.offset = NULL, NBdisp = "trend",
bias.fold.tolerance = 1.1)
Directory where the sorted bam files (and their corresponding bam indices) are saved.
File names (with file extensions) of the ChIP sample files in sorted bam format.
File names (with file extensions) of the control/input sample files in sorted bam format.
The desired significance threshold used to call peaks. Must be in (0, 0.5).
Window size (constant across all samples) used to generate a
partition for counts. If NULL
, it will be estimated based on
Shimazaki and Shinomoto (2007).
Average length of the ChIP fragments in each sample
provided. Reads are extended to this length in the 5'-to-3' direction. If
NULL
, cross correlation will be used to estimate the fragment
The count threshold used for filtering out windows with sparse counts. Any genomic window with a total count, across all samples, less than this value will be removed.
Must be one of "Poisson"
or "NegBin"
, specifying use of a
quasi-Poisson or quasi-negative binomial model, respectively.
logical. If TRUE
, updates are provided regarding
which window (row number) is being analyzed. Updates occur frequently to
start then eventually occur every 5000 windows.
Must be one of "QL"
, "QLShrink"
, or "QLSpline"
,
specifying which method of Lund, Nettleton, McCarthy and Smyth (2012) should be used to
compute p-values.
FDR control method applied to the windows. Must be
either "BH"
or "BY"
to specify the procedure of Benjamini-Hochberg
or Benjamini-Yekutieli, respectively.
Must be one of "Deviance"
or "Pearson"
, specifying which
type of estimator should be used for estimating the quasi-likelihood
dispersion parameters.
A vector of log-scale, additive factors used to adjust
estimated log-scale means for differences in library sizes across samples.
Commonly used offsets include log.offset=log(colSums(counts))
and
log.offset=log(apply(counts[rowSums(counts)!=0,],2,quantile,.75))
.
If NULL
, the later offset is used.
Used only when Model="NegBin"
. Must be one of "trend"
,
"common"
, or a vector of non-negative real numbers with length equal to
nrow(counts)
. Specifying NBdisp="trend"
or
NBdisp="common"
will use estimateGLMTrendedDisp or
estimateGLMCommonDisp
, respectively, from the package
edgeR
to estimate negative binomial dispersion parameters for each
window. Estimates obtained from other sources can be used by entering
NBdisp
as a vector containing the negative binomial dispersion value
to use for each window when fitting the quasi-likelihood model.
A numerical value no smaller than 1. If the bias
reduction of maximum likelihood estimates of (log) fold change is likely to
result in a ratio of fold changes greater than this value, then bias
reduction will be performed on such windows. Setting
bias.fold.tolerance=Inf
will completely disable bias reduction;
setting bias.fold.tolerance=1
will always perform bias reduction.
See NBDev
or PoisDev
for details.
A list containing:
Dataframe of the called peaks with columns for the start and end location, width, chromosome, p-value, and q-value computed using the Benjamini and Hochberg method.
The window width used to create the counts dataframe.
Vector of the fragment lengths used to extend the reads in each sample.
The count threshold used to create the counts dataframe. Windows with counts below this value were removed.
This function calls peaks in replicated ChIP-seq data using the BinQuasi algorithm of Goren, Liu, Wang, and Wang.
Goren, Liu, Wang and Wang (2018) "BinQuasi: a peak detection method for ChIP-sequencing data with biological replicates" Bioinformatics.
Shimazaki and Shinomoto (2007) "A method for selecting the bin size of a time histogram" Neural computation, 19(6), 1503-27.
Ramachandran, Palidwor, Porter, and Perkins (2013) "MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data" Bioinformatics 29(4), 444-50.
Benjamini and Hochberg (1995) "Controlling the false discovery rate: a practical and powerful approach to multiple testing" Journal of the Royal Statistical Society Series B, 57: 289-300.
Benjamini and Yekutieli (2001) "The control of the false discovery rate in multiple testing under dependency" Annals of Statistics. 29: 1165-1188.
Lund, Nettleton, McCarthy and Smyth (2012) "Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates" SAGMB, 11(5).
# NOT RUN {
# Fit a quasi-negative binomial model using all default settings.
fpath <- paste0(system.file(package = 'BinQuasi'), '/extdata/')
fpath
results <- BQ(fpath, ChIP.files = c('C1.bam', 'C2.bam'), control.files = c('I1.bam', 'I2.bam'))
head(results$peaks)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab