simMAEcheck: Model checking for One Sample Problems.

Description

Simulates RNA-seq data under the same experimental setting as in the observed data, and compares the observed vector of number of reads per gene with the simulations.

Usage

simMAEcheck(nsim, islandid, burnin=1000, pc, distr, readLength.pilot, eset.pilot, usePilot=FALSE, retTxsError=FALSE, genomeDB, mc.cores=1, mc.cores.int=1, verbose=FALSE)

Arguments

nsim

Number of RNA-seq datasets to generate (often as little as nsim=10 suffice)

islandid

When specified this argument indicates to run the simulations only for gene islands with identifiers in islandid. When not specified genome-wide simulations are performed.

burnin

Number of MCMC burn-in samples (passed on to calcExp)

Observed path counts in pilot data. When not specified, these are simulated from eset.pilot

distr

Estimated read start and insert size distributions in pilot data

readLength.pilot

Read length in pilot data

eset.pilot

ExpressionSet with pilot data expression in log2-RPKM, used to simulate pc when not specified by the user. See details

usePilot

By default casper assumes that the pilot data is from a related experiment rather than the current tissue of interest (usePilot=FALSE). Hence, the pilot data is used to simulate new RNA-seq data but not to estimate its expression. However, in some cases we may be interested in re-sequencing the pilot sample at deeper length, in which case one would want to combine the pilot data with the new data to obtain more precise estimates. This can be achieved by setting usePilot=TRUE

retTxsError

If retTxsError=TRUE, simMAE returns posterior expected MAE for each individual isoform. This option is not available when eset.pilot is specified instead of pc. Else the output is a data.frame with overall MAE across all isoforms

genomeDB

annotatedGenome object, as returned by procGenome

mc.cores

Number of cores to use in the expression estimation step, passed on to calcExp

mc.cores.int

Number of cores to simulate RNA-seq datasets in parallel

verbose

Set verbose=TRUE to print progress information

Value

The output is a list with 2 entries. The first entry is a data.frame with overall MAE across all isoforms in the simulations (see simMAE for details). The second entry contains the expected number of genes for which the number of reads in the data lies in the range of the posterior predictive simulations (under the hypothesis that they have the same distribution) and the actual number of genes for which the condition is satisfied.

Details

simMAEcheck simulates nsim datasets under the same experimental setting as in the observed data. For more details, please check the documentation for simMAE, which is the basis of this function.

References

Stephan-Otto Attolini C., Pena V., Rossell D. Bayesian designs for personalized alternative splicing RNA-seq studies (2014)

Li, W. and Freudenberg, J. and Miramontes, P. Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome. BMC Bioinformatics, 15, 2 (2014)

Examples

Run this code

#Run casperDesign() to see full manual with examples

Run the code above in your browser using DataLab