Learn R Programming

casper (version 2.6.0)

simMAEcheck: Model checking for One Sample Problems.

Description

Simulates RNA-seq data under the same experimental setting as in the observed data, and compares the observed vector of number of reads per gene with the simulations.

Usage

simMAEcheck(nsim, islandid, burnin=1000, pc, distr, readLength.pilot, eset.pilot, usePilot=FALSE, retTxsError=FALSE, genomeDB, mc.cores=1, mc.cores.int=1, verbose=FALSE)

Arguments

nsim
Number of RNA-seq datasets to generate (often as little as nsim=10 suffice)
islandid
When specified this argument indicates to run the simulations only for gene islands with identifiers in islandid. When not specified genome-wide simulations are performed.
burnin
Number of MCMC burn-in samples (passed on to calcExp)
pc
Observed path counts in pilot data. When not specified, these are simulated from eset.pilot
distr
Estimated read start and insert size distributions in pilot data
readLength.pilot
Read length in pilot data
eset.pilot
ExpressionSet with pilot data expression in log2-RPKM, used to simulate pc when not specified by the user. See details
usePilot
By default casper assumes that the pilot data is from a related experiment rather than the current tissue of interest (usePilot=FALSE). Hence, the pilot data is used to simulate new RNA-seq data but not to estimate its expression. However, in some cases we may be interested in re-sequencing the pilot sample at deeper length, in which case one would want to combine the pilot data with the new data to obtain more precise estimates. This can be achieved by setting usePilot=TRUE
retTxsError
If retTxsError=TRUE, simMAE returns posterior expected MAE for each individual isoform. This option is not available when eset.pilot is specified instead of pc. Else the output is a data.frame with overall MAE across all isoforms
genomeDB
annotatedGenome object, as returned by procGenome
mc.cores
Number of cores to use in the expression estimation step, passed on to calcExp
mc.cores.int
Number of cores to simulate RNA-seq datasets in parallel
verbose
Set verbose=TRUE to print progress information

Value

The output is a list with 2 entries. The first entry is a data.frame with overall MAE across all isoforms in the simulations (see simMAE for details). The second entry contains the expected number of genes for which the number of reads in the data lies in the range of the posterior predictive simulations (under the hypothesis that they have the same distribution) and the actual number of genes for which the condition is satisfied.

Details

simMAEcheck simulates nsim datasets under the same experimental setting as in the observed data. For more details, please check the documentation for simMAE, which is the basis of this function.

References

Stephan-Otto Attolini C., Pena V., Rossell D. Bayesian designs for personalized alternative splicing RNA-seq studies (2014)

Li, W. and Freudenberg, J. and Miramontes, P. Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics, 15, 2 (2014)

See Also

wrapKnown,simReads,calcExp

Examples

Run this code
#Run casperDesign() to see full manual with examples

Run the code above in your browser using DataLab