simulate_experiment_empirical: Simulate RNA-seq experiment based on abundances from a data set

Description

Create fasta files representing reads from a two-group experiment, where abundances and differential expression are estimated from a real data set

Usage

simulate_experiment_empirical(bg = NULL, fpkmMat = NULL, mean_rps = 5e+06, grouplabels = NULL, decut = 1.5, outdir = ".", ...)

Arguments

Ballgown object containing estimated transcript abundances in FPKM. Reads will be simulated for the same number of replicates that are in bg. Must provide exactly one of bg and fpkmMat.

fpkmMat

transcript-by-sample matrix containing abundances (in FPKM) estimated from a real data set. MUST have row names identifying transcripts. The number of columns is the number of samples that will be simulated.

mean_rps

Number of reads per sample to use in converting FPKM measurements to counts. Should be somewhat close to the number of reads per sample in the experiment that generated the estimated FPKMs. Defaults to 5 million (5e6).

grouplabels

vector indicating the group labels for each replicate in the experiment. Must be convertible to a factor with exactly 2 levels.

decut

A transcript will be recorded as truly differentially expressed if its fold change between the two groups is more extreme than decut, in either direction.

outdir

character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory.

...

Additional arguments to pass to simulate_experiment_countmat

Value

No return, but reads are written to outdir.

Examples

Run this code

## Not run: 
# 
#   library(ballgown)
#   data(bg)
#   bg = subset(bg, "chr=='22'")
# 
#   # load gtf file:
#   gtfpath = system.file('extdata', 'bg.gtf.gz', package='polyester')
#   gtf = subset(gffRead(gtfpath), seqname=='22')
# 
#   # load/download chromosome sequence (just for this example)
#   system('wget https://www.dropbox.com/s/04i6msi9vu2snif/chr22seq.rda')
#   load('chr22seq.rda')
#   names(chr22seq) = '22'
# 
#   # simulate reads based on this experiment's FPKMs
#   simulate_experiment_empirical(bg, grouplabels=pData(bg)$group, gtf=gtf,
#      seqpath=chr22seq, mean_rps=5000, outdir='simulated_reads_3', seed=1247)
# ## End(Not run)

Run the code above in your browser using DataLab