Learn R Programming

ChIPseqR (version 1.26.0)

callBindingSites-methods: Predict protein binding sites from high-throughput sequencing data

Description

Methods for function callBindingSites in Package `ChIPseqR'. These methods are used to identify protein binding sites from ChIP-seq data.

Usage

"callBindingSites"(data, chrLen, plot=TRUE, verbose=TRUE, ..., plotTo) "callBindingSites"(data, type, minQual=70, ...) "callBindingSites"(data, chrName="chr", ...) "callBindingSites"(data, bind, support, background, bgCutoff=0.9, supCutoff=0.9, fdr = 0.05, extend=1, tailCut=0.95, piLambda=0.5, adapt=FALSE, corSummary=median, compress = TRUE, digits = 16, plot=TRUE, verbose=TRUE, ask=FALSE, plotTo, ...)

Arguments

data
Either an object containing information about mapped reads or a list. See below for details.
bind
Length of binding region to use (see Details).
support
Length of support region to use (see Details).
background
Length of background window. If this is missing it will be set to 10*(bind+2*support).
chrLen
Numeric vector indicating the length of all chromosomes. Only needed when data is an AlignedRead object. readBfaToc may be used to supply this information.
bgCutoff
Numeric value between 0.5 and 1. This determines how much estimates of the background read density are allowed to vary for adjacent windows. Set to 1 to disable cutoff.
supCutoff
Numeric value between 0.5 and 1. This determines how much estimates of the support region read density are allowed to vary for forward and reverse strand. Set to 1 to disable cutoff.
fdr
Target false discovery rate.
extend
Numeric value indicating how far mapped reads should be extended when calculating read counts.
type
Format of alignment file (see readAligned forr details).
minQual
Minimum alignment quality to use. All reads with lower alignment quality are discarded.
tailCut
Truncation point used to exclude outliers when estimating null distribution.
chrName
Name to use for the single chromosome.
piLambda
If adapt=TRUE this parameter is used to estimate the proportion of scores not related to binding sites.
adapt
Logical indicating whether an adaptive false discovery rate should be used. If this is FALSE (the default) the usual Benjamini-Hochberg procedure is used to control the FDR.
corSummary
Function used to summarise cross-correlation across chromosomes. See the Details section on binding and support region.
compress
Logical indicating whether the return value should be compressed.
digits
Number of decimal places to retain for binding site score for compression.
plot
Logical. If plot=TRUE (the default) some diagnostic plots are produced during the analysis.
verbose
Logical. If verbose=TRUE (the default) status messages are printed to indicate progress.
ask
Logical. Setting this to TRUE causes the system to wait for user input before displaying a new plot. See devAskNewPage.
plotTo
Character string giving the name of a file that should be used to store plots generated during the analysis. If this is not missing a pdf file with the given name will be created.
...
Additional arguments. Most methods pass them on to the ReadCounts method.

Value

An object of class BindScore if compress = FALSE, otherwise an object of class RLEBindScore

Methods

data = "ANY"
Default method to handle all forms of input not explicitly handled by their own method. In particular this will be used for objects of class AlignedRead and data.frame but it will handle class for which a strandPileup method is available.
data = "character"
Allows to use a file name referring to a file of mapped sequence reads as input.
data = "matrix"
Uses a matrix of read counts (for a single chromosome) as input.
data = "ReadCounts"
This methods implements the peak calling algorithm. Other methods will typically reformat their input and pass it on to this method.

Details

The length of binding and support regions can either be given as a single value or as a range of possible values (by providing the minimum and maximum). In the latter case the cross-correlation between read counts on forward and reverse strand will be used to determine a value within that range. Note that this may lead sub-optimal choices of binding and support region length.

See Also

simpleNucCall for an interface with nucleosome specific defaults. This function uses strandPileup, startScore, getCutoff and pickPeak. See the help pages of these functions for additional detail on the individual steps involved. See getBindLen for details on the estimation of binding and support region length.

Examples

Run this code
set.seed(1)

## determine binding site locations
b <- sample(1:1e6, 5000)

## sample read locations
fwd <- unlist(lapply(b, function(x) sample((x-83):(x-73), 20, replace=TRUE)))
rev <- unlist(lapply(b, function(x) sample((x+73):(x+83), 20, replace=TRUE)))

## add some background noise
fwd <- c(fwd, sample(1:(1e6-25), 50000))
rev <- c(rev, sample(25:1e6, 50000))

## create data.frame with read positions as input to strandPileup
reads <- data.frame(chromosome="chr1", position=c(fwd, rev), 
	length=25, strand=factor(rep(c("+", "-"), times=c(150000, 150000))))

## create object of class ReadCounts
readPile <- strandPileup(reads, chrLen=1e6, extend=1, plot=FALSE)

## predict binding site locations
## the artificial dataset is very small so predictions may not be very reliable
bindScore <- callBindingSites(readPile, bind=147, support=20, background=2000, plot=FALSE)

Run the code above in your browser using DataLab