wrapDenovo(bamFile, output_wrapKnown, knownGenomeDB, targetGenomeDB, readLength, rpkm=TRUE, keep.multihits=TRUE, searchMethod="submodels", exactMarginal=TRUE, integrateMethod = "plugin", maxExons=40, islandid, chroms=NULL, keep.pbam=FALSE, keepPbamInMemory=FALSE, niter=10^3, priorq=3, priorqGeneExpr=2, mc.cores.int=1, mc.cores=1, verbose=TRUE, seed=1)
wrapKnown
. If provided, path counts,
read start and insert size distributions are loaded from this output
rather than being re-computed. Better leave this argument missing
unless you know what you're doing.annotatedGenome
object with known isoforms,
e.g. from UCSC or GENCODE annotations. Used to set the prior
probability that any given isoform is expressed. See help(calcDenovo)
for details.annotatedGenome
object with isoforms we wish to
quantify. By default these are the same as in knownGenomeDB
,
but more typically targetGenomeDB
is imported from a .gtf file
produced by some isoform prediction software.readLength=75
.TRUE
to return reads per kilobase per million (RPKM),
FALSE
for relative expression levels.
Important, relative expression adds up to 1 within gene island, NOT
within gene. To get relative expressions within gene run
relexprByGene
afterwards. See help(wrapKnown).FALSE
to discard reads aligned to
multiple positions."allmodels"
enumerates all possible models (warning:
this is not feasible for genes with >5 exons). "rwmcmc"
uses a
random-walk MCMC scheme to focus on models with high posterior
probability.
"submodels"
considers that some isoforms in targetGenomeDB
may not be expressed, but does not search for new variants.
"auto"
uses "allmodels"
for genes with up to 5
exons and "rwmcmc"
for longer genes. See help("calcDenovo").FALSE
to estimate posterior model
probabilities as the proportion of MCMC visits. Set to TRUE
to
use the integrated likelihoods (default). See details.'plugin'
) evaluates likelihood*prior at the posterior mode and
is the faster option. Set 'Laplace'
for Laplace approximations
and 'IS'
for Importance Sampling. The latter increases
computation cost very substantially.maxExons
exons separately,
for genes with more than maxExons
exons a combined
estimate is used. See help("modelPrior")
TRUE
to save processed bam object, as
returned by procBam
. This object can require substantial memory
during execution and disk storage upon saving and is not needed
for a default analysis.TRUE
to keep processed bam objects
in memory to speed up some computations.priorq=3
as this
defines a non-local prior that penalizes falsely predicted isoforms.TRUE
to display progress information.annotatedGenome
that contains the
isoforms in targetGenomeDB
plus any new isoforms predicted by
casperExpressionSet
containing Bayesian
model averaging expression estimates. See the fData for the posterior
probability that each isoform is expressed.readDistrs
procBam
with one element
per chromosomeprocBam
, getDistrs
,
pathCounts
calcDenovo
and denovoExpr
and formats the output nicely.
Running wrapDenovo
is much more efficient in cpu
speed and memory usage than running these functions separately. When rpkm
is false the function returns the estimated
proportion of reads arising from each isoform within a gene island.
See the details in help("wrapKnown")
for more information on this.
calcDenovo
, wrapKnown
, relexprByGene
## not run
## Known isoforms
## library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## hg19DB <- procGenome(TxDb.Hsapiens.UCSC.hg19.knownGene), genome='hg19')
## gtf with known & de novo predictions
## mygtf <- import('hg19_denovo.gtf')
## hg19denovoDB <- procGenome(mygtf, genome='hg19')
## bamFile="/path_to_bam/sorted.bam"
## ans <- wrapDenovo(bamFile=bamFile, targetGenomeDB=hg19denovoDB, knownGenomeDB=hg19DB, readLength=101)
## Estimated expression via BMA
## head(exprs(ans[['exp']]))
## Posterior probability that each isoform is expressed
## head(fData(ans[['exp']]))
Run the code above in your browser using DataLab