Learn R Programming

alpine (version 0.99.1)

estimateTheta: Estimate bias-corrected transcript abundances (FPKM)

Description

This function takes the fitted bias parameters from fitBiasModels and uses this information to derive bias corrected estimates of transcript abundance for a gene (with one or more isoforms) across multiple samples.

Usage

estimateTheta(transcripts, bam.files, fitpar, genome, models, readlength, minsize, maxsize, subset = TRUE, niter = 100, lib.sizes = NULL, optim = FALSE, custom.features = NULL)

Arguments

transcripts
a GRangesList of the exons for multiple isoforms of a gene. For a single-isoform gene, just wrap the exons in GRangesList()
bam.files
a named vector pointing to the indexed BAM files
fitpar
the output of fitBiasModels
genome
a BSGenome object
models
a list of character strings or formula describing the bias models, see vignette
readlength
the read length
minsize
the minimum fragment length to model
maxsize
the maximum fragment length to model
subset
logical, whether to downsample the non-observed fragments. Default is TRUE
niter
the number of EM iterations. Default is 100.
lib.sizes
a named vector of library sizes to use in calculating the FPKM. If NULL (the default) a value of 1e6 is used for all samples.
optim
logical, whether to use numerical optimization instead of the EM. Default is FALSE.
custom.features
an optional function to add custom features to the fragment types DataFrame. This function takes in a DataFrame returned by buildFragtypes and returns a DataFrame with additional columns added. Default is NULL, adding no custom features.

Value

a list of lists. For each sample, a list with elements: theta, lambda and count.
  • theta gives the FPKM estimates for the isoforms in transcripts
  • lambda gives the average bias term for the isoforms
  • count gives the number of fragments which are compatible with any of the isoforms in transcripts

Examples

Run this code

# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)
models <- list(
 "GC"=list(formula="count~
 ns(gc,knots=gc.knots,Boundary.knots=gc.bk) +
 ns(relpos,knots=relpos.knots,Boundary.knots=relpos.bk) +
 0",
 offset=c("fraglen"))
)

readlength <- 75
minsize <- 125 # see vignette how to choose
maxsize <- 175 # see vignette how to choose
txs <- txdf.theta$tx_id[txdf.theta$gene_id == "ENSG00000198918"]

res <- estimateTheta(transcripts=ebt.theta[txs],
                     bam.files=bam.file,
                     fitpar=fitpar.small,
                     genome=Hsapiens,
                     models=models,
                     readlength=readlength,
                     minsize=minsize,
                     maxsize=maxsize)

Run the code above in your browser using DataLab