wrapDenovo(bamFile, output_wrapKnown, knownGenomeDB, targetGenomeDB, readLength, rpkm=TRUE, keep.multihits=TRUE, searchMethod="submodels", exactMarginal=TRUE, integrateMethod = "plugin", maxExons=40, islandid, chroms=NULL, keep.pbam=FALSE, keepPbamInMemory=FALSE, niter=10^3, priorq=3, priorqGeneExpr=2, mc.cores.int=1, mc.cores=1, verbose=TRUE, seed=1)wrapKnown. If provided, path counts,
read start and insert size distributions are loaded from this output
rather than being re-computed. Better leave this argument missing
unless you know what you're doing.annotatedGenome object with known isoforms,
e.g. from UCSC or GENCODE annotations. Used to set the prior
probability that any given isoform is expressed. See help(calcDenovo)
for details.annotatedGenome object with isoforms we wish to
quantify. By default these are the same as in knownGenomeDB,
but more typically targetGenomeDB is imported from a .gtf file
produced by some isoform prediction software.readLength=75.TRUE to return reads per kilobase per million (RPKM),
FALSE for relative expression levels.
Important, relative expression adds up to 1 within gene island, NOT
within gene. To get relative expressions within gene run
relexprByGene afterwards. See help(wrapKnown).FALSE to discard reads aligned to
multiple positions."allmodels" enumerates all possible models (warning:
this is not feasible for genes with >5 exons). "rwmcmc" uses a
random-walk MCMC scheme to focus on models with high posterior
probability.
"submodels" considers that some isoforms in targetGenomeDB
may not be expressed, but does not search for new variants.
"auto" uses "allmodels" for genes with up to 5
exons and "rwmcmc" for longer genes. See help("calcDenovo").FALSE to estimate posterior model
probabilities as the proportion of MCMC visits. Set to TRUE to
use the integrated likelihoods (default). See details.'plugin') evaluates likelihood*prior at the posterior mode and
is the faster option. Set 'Laplace' for Laplace approximations
and 'IS' for Importance Sampling. The latter increases
computation cost very substantially.maxExons exons separately,
for genes with more than maxExons exons a combined
estimate is used. See help("modelPrior")TRUE to save processed bam object, as
returned by procBam. This object can require substantial memory
during execution and disk storage upon saving and is not needed
for a default analysis.TRUE to keep processed bam objects
in memory to speed up some computations.priorq=3 as this
defines a non-local prior that penalizes falsely predicted isoforms.TRUE to display progress information.annotatedGenome that contains the
isoforms in targetGenomeDB plus any new isoforms predicted by
casperExpressionSet containing Bayesian
model averaging expression estimates. See the fData for the posterior
probability that each isoform is expressed.readDistrsprocBam with one element
per chromosomeprocBam, getDistrs,
pathCounts calcDenovo and denovoExpr
and formats the output nicely.
Running wrapDenovo is much more efficient in cpu
speed and memory usage than running these functions separately. When rpkm is false the function returns the estimated
proportion of reads arising from each isoform within a gene island.
See the details in help("wrapKnown") for more information on this.
calcDenovo, wrapKnown, relexprByGene
## not run
## Known isoforms
## library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## hg19DB <- procGenome(TxDb.Hsapiens.UCSC.hg19.knownGene), genome='hg19')
## gtf with known & de novo predictions
## mygtf <- import('hg19_denovo.gtf')
## hg19denovoDB <- procGenome(mygtf, genome='hg19')
## bamFile="/path_to_bam/sorted.bam"
## ans <- wrapDenovo(bamFile=bamFile, targetGenomeDB=hg19denovoDB, knownGenomeDB=hg19DB, readLength=101)
## Estimated expression via BMA
## head(exprs(ans[['exp']]))
## Posterior probability that each isoform is expressed
## head(fData(ans[['exp']]))
Run the code above in your browser using DataLab