Learn R Programming

alpine (version 0.99.1)

fitBiasModels: Fit bias models over single-isoform genes

Description

This function estimates parameters for one or more bias models for a single sample over a set of single-isoform genes. ~100 medium to highly expressed genes should be sufficient to estimate the parameters robustly.

Usage

fitBiasModels(genes, bam.file, fragtypes, genome, models, readlength, minsize, maxsize, speedglm = TRUE)

Arguments

genes
a GRangesList with the exons of different single-isoform genes
bam.file
a character string pointing to an indexed BAM file
fragtypes
the output of buildFragtypes. must contain the potential fragment types for the genes named in genes
genome
a BSgenome object
models
a list of lists: the outer list describes multiple models each element of the inner list has two elements: formula and offset. formula should be a character strings of an R formula describing the bias models, e.g. "count ~ ns(gc) + gene". offset should be a character vector listing possible bias offsets to be used ("fraglen" or "vlmm"). Either offset or formula can be NULL for a model. See vignette for recommendations and details.
readlength
the read length
minsize
the minimum fragment length to model
maxsize
the maximum fragment length to model
speedglm
logical, whether to use speedglm to estimate the coefficients. Default is TRUE.

Value

a list with elements: coefs, summary, models, and optional offets: fraglen.density, vlmm.fivep, and vlmm.threep.
  • coefs gives the estimated coefficients for the different models that specified formula.
  • summary gives the tables with coefficients, standard errors and p-values,
  • models stores the incoming models list,
  • fraglen.density is a estimated density object for the fragment length distribution,
  • vlmm.fivep and vlmm.threep store the observed and expected tabulations for the different orders of the VLMM for read start bias.

References

The complete bias model including fragment sequence bias:

Love, M.I., Hogenesch, J.B., and Irizarry, R.A., Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. bioRxiv (2015) doi: 10.1101/025767

The read start VLMM:

Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., and Pachter, L., Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology (2011) doi: 10.1186/gb-2011-12-3-r22

Examples

Run this code

# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)
data(preprocessedData)

readlength <- 100
minsize <- 125 # see vignette how to choose
maxsize <- 175 # see vignette how to choose

# here a very small subset, should be ~100 genes
gene.names <- names(ebt.fit)[6:8] 
names(gene.names) <- gene.names
fragtypes <- lapply(gene.names, function(gene.name) {
                      buildFragtypes(ebt.fit[[gene.name]],
                                     Hsapiens, readlength,
                                     minsize, maxsize)
})
models <- list(
  "GC" = list(formula = "count ~ ns(gc,knots=gc.knots,
                Boundary.knots=gc.bk) + gene",
              offset=c("fraglen"))
)

fitpar <- fitBiasModels(genes=ebt.fit[gene.names],
                        bam.file=bam.file,
                        fragtypes=fragtypes,
                        genome=Hsapiens,
                        models=models,
                        readlength=readlength,
                        minsize=minsize,
                        maxsize=maxsize)

Run the code above in your browser using DataLab