predictCoverage: Predict coverage for a single-isoform gene

Description

Predict coverage for a single-isoform gene given fitted bias parameters in a set of models, and compare to the observed fragment coverage.

Usage

predictCoverage(gene, bam.files, fitpar, genome, models, readlength, minsize, maxsize)

Arguments

gene

a GRangesList with the exons of different genes

bam.files

a character string pointing to indexed BAM files

fitpar

the output of running fitBiasModels

genome

a BSgenome object

models

a list describing the models, see fitBiasModels

readlength

the read length

minsize

the minimum fragment length to model

maxsize

the maximum fragment length to model

Value

a list with elements frag.cov, the observed fragment coverage from the bam.files and pred.cov, a list with the predicted fragment coverage for each of the models.

Details

Note that if the range between minsize and maxsize does not cover most of the fragment length distribution, the predicted coverage will underestimate the observed coverage.

Examples

Run this code


# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(BSgenome.Hsapiens.NCBI.GRCh38)
pred.models <- list(
  "fraglen" = list(formula=NULL, offset=c("fraglen")),
  "readstart" = list(formula=NULL, offset=c("fraglen","vlmm")),
  "GC" = list(formula = "count ~
  ns(gc,knots=gc.knots,Boundary.knots=gc.bk) +
  ns(relpos,knots=relpos.knots,Boundary.knots=relpos.bk) +
  0",
  offset=c("fraglen")),
  "all" = list(formula = "count ~
  ns(gc,knots=gc.knots,Boundary.knots=gc.bk) +
  ns(relpos,knots=relpos.knots,Boundary.knots=relpos.bk) +
  0",
  offset=c("fraglen","vlmm"))
)

readlength <- 75
minsize <- 125 # see vignette how to choose
maxsize <- 175 # see vignette how to choose

pred.cov <- predictCoverage(gene=ebt.fit[["ENST00000379660"]],
                            bam.files=bam.file,
                            fitpar=fitpar.small,
                            genome=Hsapiens,
                            models=pred.models,
                            readlength=readlength,
                            minsize=minsize,
                            maxsize=maxsize)
# plot the coverage:
# note that, because [125,175] does not cover the
# fragment width distribution, the predicted curves
# will underestimate the observed. we correct here post-hoc

frag.cov <- pred.cov[["ERR188088"]][["frag.cov"]]
plot(frag.cov, type="l", lwd=3, ylim=c(0,max(frag.cov)*1.5))
for (i in seq_along(pred.models)) {
  m <- names(pred.models)[i]
  pred <- pred.cov[["ERR188088"]][["pred.cov"]][[m]]
  lines(pred/mean(pred)*mean(frag.cov), col=i+1, lwd=3)
}
legend("topright", legend=c("observed",names(pred.models)),
       col=seq_len(length(pred.models)+1), lwd=3)

Run the code above in your browser using DataLab