Learn R Programming

polyester (version 1.6.0)

simulate_experiment_countmat: Simulate RNA-seq experiment

Description

create FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)

Usage

simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL, readmat, outdir = ".", paired = TRUE, seed = NULL, ...)

Arguments

fasta
path to FASTA file containing transcripts from which to simulate reads. See details.
gtf
path to GTF file or data frame containing transcript structures from which reads should be simulated. See details and seq_gtf.
seqpath
path to folder containing one FASTA file (.fa extension) or DNAStringSet containing one entry for each chromosome in gtf. See details and seq_gtf.
readmat
matrix with rows representing transcripts and columns representing samples. Entry i,j specifies how many reads to simulate from transcript i for sample j.
outdir
character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory.
paired
If TRUE, paired-end reads are simulated; else single-end reads are simulated.
seed
Optional seed to set before simulating reads, for reproducibility.
...
Additional arguments to add nuance to the simulation, as described extensively in the details of simulate_experiment, or to pass to seq_gtf, if gtf is not NULL.

Value

No return, but simulated reads are written to outdir.

Details

Reads can either be simulated from a FASTA file of transcripts (provided with the fasta argument) or from a GTF file plus DNA sequences (provided with the gtf and seqpath arguments). Simulating from a GTF file and DNA sequences may be a bit slower: it took about 6 minutes to parse the GTF/sequence files for chromosomes 1-22, X, and Y in hg19.

References

Li W and Jiang T (2012): Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28(22): 2914-2921.

Examples

Run this code

  fastapath = system.file("extdata", "chr22.fa", package="polyester")
  numtx = count_transcripts(fastapath)
  readmat = matrix(20, ncol=10, nrow=numtx)
  readmat[1:30, 1:5] = 40

  simulate_experiment_countmat(fasta=fastapath,
    readmat=readmat, outdir='simulated_reads_2', seed=5)

Run the code above in your browser using DataLab