parallel
. Results are returned in R.
bayespeak(treatment, control, chr = NULL, start,
end, bin.size = 100L, iterations = 10000L,
repeat.offset = TRUE, into.jobs = TRUE, job.size = 6E6L,
job.overlap = 20L, use.multicore = FALSE,
mc.cores = getOption("mc.cores", 1), snow.cluster,
prior = c(5, 5, 10, 5, 25, 4, 0.5, 5),
report.p.samples = TRUE)
Each of these arguments can be:
read.bed
).
data.frame
, which should have columns "chr", "start", "end", "strand"
.
RangedData
object. This object is expected to be split into spaces by chromosome, and should have a data track labelled "strand".
The control
argument is entirely optional. (Mathematically, leaving this argument out is equivalent to setting gamma = 1 in the model.)
Strand information is expected to be given as "+" or "-".
If left as the default value chr = NULL
, then BayesPeak will find all chromosomes present in the treatment
file.
TRUE
, the algorithm is run a second time, this time with the bins offset by floor(window/2)
.
into.chunks = FALSE
. This may put BayesPeak at increased risk of overflow and underflow issues, and will additionally prevent usage of the parallel processing options.
job.overlap
corresponds to the number of bins by which each job is expanded.
use.multicore = TRUE
, then the individual chunks will be processed in parallel, using the mclapply
function.
mclapply
function.
snow
package. A cluster can be created via the makeCluster
function.
data.frame
corresponding to the bins that BayesPeak has identified as potentially being enriched. chr, start, end
give the genomic co-ordinates of the bin. PP
refers to the posterior probability of the bin being enriched. job
is the number of the job within which the bin was called, which corresponds to a row in the QC data.frame (see below).calls
is the number of potentially enriched bins identified in a job (i.e. bins with PP
> 0.01).
score
is simply the proportion of potentially enriched bins with a PP value above 0.5. Intuitively, a larger score is "better", as it indicates that more of the PP values have tended to 0 or 1.
chr, start, end
are the genomic co-ordinates of the job.
p, theta, lambda0, lambda1, gamma
and the average log likelihood loglhood
.
var
is the variance of the bin counts.
autocorr
is an estimate of the first order autocorrelation of bin counts.
status
indicates whether the job was normal, or offset by half a bin width.
list
of matrix
objects, each containing parameter samples from the MCMC runs. p.samples[[i]]
corresponds to the samples taken in job i
. Samples are taken every 10 iterations, with the first half of the run being discarded, and to avoid using too much memory, not all parameters are given. This output can be used to assess convergence e.g. using the CRAN packages coda or boa. (see the vignette - vignette("BayesPeak")
)
summarize.peaks
function before using it in later analysis.read.bed
, summarize.peaks
.
dir <- system.file("extdata", package="BayesPeak")
treatment <- file.path(dir, "H3K4me3reduced.bed")
input <- file.path(dir, "Inputreduced.bed")
##look at specific region 92-95Mb on chromosome 16
##(we've used half the number of iterations here to reduce the time this example takes)
raw.output <- bayespeak(treatment, input, chr = "chr16", start = 9.2E7, end = 9.5E7, iterations = 5000L, use.multicore = TRUE)
output <- summarize.peaks(raw.output)
output
## Not run:
# ##analyse all data in file
# raw.output.wg <- bayespeak(treatment, input, use.multicore = TRUE)
# output <- summarize.peaks(raw.output.wg)
# ## End(Not run)
Run the code above in your browser using DataLab