Learn R Programming

polyester (version 1.8.3)

simulate_experiment_countmat: Simulate RNA-seq experiment

Description

create FASTA files containing RNA-seq reads simulated from provided transcripts, with optional differential expression between two groups (designated via read count matrix)

Usage

simulate_experiment_countmat(fasta = NULL, gtf = NULL, seqpath = NULL, readmat, outdir = ".", fraglen = 250, fragsd = 25, readlen = 100, error_rate = 0.005, paired = TRUE, seed = NULL, ...)

Arguments

fasta
path to FASTA file containing transcripts from which to simulate reads. See details.
gtf
path to GTF file containing transcript structures from which reads should be simulated. See details.
seqpath
path to folder containing one FASTA file (.fa extension) for each chromosome in gtf. See details.
readmat
matrix with rows representing transcripts and columns representing samples. Entry i,j specifies how many reads to simulate from transcript i for sample j.
outdir
character, path to folder where simulated reads should be written, without a slash at the end of the folder name. By default, reads written to the working directory.
fraglen
Mean RNA fragment length. Sequences will be read off the end(s) of these fragments.
fragsd
Standard deviation of fragment lengths.
readlen
Read length
error_rate
Sequencing error rate. Must be between 0 and 1. A uniform error model is assumed.
paired
If TRUE, paired-end reads are simulated; else single-end reads are simulated.
seed
Optional seed to set before simulating reads, for reproducibility.
...
Further arguments to pass to seq_gtf, if gtf is not NULL.

Value

No return, but simulated reads are written to outdir.

Details

Reads can either be simulated from a FASTA file of transcripts (provided with the fasta argument) or from a GTF file plus DNA sequences (provided with the gtf and seqpath arguments). Simulating from a GTF file and DNA sequences may be a bit slower: it took about 6 minutes to parse the GTF/sequence files for chromosomes 1-22, X, and Y in hg19.

Examples

Run this code

  fastapath = system.file("extdata", "chr22.fa", package="polyester")
  numtx = count_transcripts(fastapath)
  readmat = matrix(20, ncol=10, nrow=numtx)
  readmat[1:30, 1:5] = 40

  simulate_experiment_countmat(fasta=fastapath,
    readmat=readmat, outdir='simulated_reads_2', seed=5)

Run the code above in your browser using DataLab