50% off: Unlimited data and AI learning.
State of Data and AI Literacy Report 2025

seedy (version 1.3)

simfixoutbreak: Simulate evolutionary dynamics on a given transmission tree

Description

Simulate within-host evolutionary dynamics on top of an existing transmission tree and generate genomic samples.

Usage

simfixoutbreak(ID,inf.times, rec.times, inf.source, mut.rate, equi.pop=10000, shape=flat,
           inoc.size=1, imp.var=25, samples.per.time=1, samp.schedule="random", 
           samp.freq=500, full=FALSE, feedback=500, glen=100000, 
           ref.strain=NULL, ...)

Arguments

ID
Vector of unique IDs.
inf.times
Vector of (integer) infection times.
rec.times
Vector of (integer) removal times.
inf.source
Vector of infection sources. The ith entry corresponds to the ID of the source of infection. For importations, the source should be 0.
mut.rate
Mutation rate (per genome per generation).
equi.pop
Equilibrium effective population size of pathogens within-host.
shape
Function describing the population growth dynamics. See Details.
inoc.size
Size of pathogen inoculum at transmission.
imp.var
The expected number of mutations separating unconnected importations.
samples.per.time
Number of samples taken at sampling times.
samp.schedule
How should sampling be conducted? Accepted values are: "calendar" - samples are taken from all current infectives every samp.freq generations; "individual" - samples are taken from each infective at intervals of samp.freq after i
full
Should `full' genomic sampling be returned? That is, should a vector of genotypes and their respective frequencies be stored from each individual's sampling times?
samp.freq
Number of generations between each sampling time (see samp.schedule).
feedback
Number of generations between simulation updates returned to R interface.
glen
Length of genome.
ref.strain
Initial sequence. By default, a random sequence of length glen.
...
Additional arguments to be passed to the shape function.

Value

  • Returns a list of outbreak data:
  • epidataA matrix of epidemiological data with columns: person ID, infection time, removal time, source of infection.
  • sampledataA matrix of genome samples with columns: person ID, sampling time, genome ID.
  • librA list with an entry for each unique genotype observed. Each entry is a vector of mutation positions relative to the reference genome.
  • nucA list with an entry for each unique genotype observed. Each entry is a vector of nucleotide types (integer between 1 and 4).
  • librstrainsA vector of unique genotype IDs corresponding to the libr object.
  • endtimeEnd time of the outbreak.

Details

Population growth dynamics are defined by the function called by 'shape'. This function returns the expected population size at each time step, given the total simulation time. By default, the population is expected to grow exponentially until reaching an equilibrium level, specified by equi.pop (flat). Alternatively, the population can follow a sinusoidal growth curve, peaking at runtime/2 (hump). User-defined functions should be of the form function(time,span,equi.pop,...), where span is equal to the duration of infection in this setting.

Examples

Run this code
# Simulate a transmission chain
inf.times <- (0:20)*100
rec.times <- inf.times + 100 + rpois(21,50)
inf.source <- 0:20
inf.source[c(3,11)] <- 0 # Two importations
mut.rate <- 0.001

# Now simulate evolutionary dynamics and samples on top of this tree
W <- simfixoutbreak(ID=1:21, inf.times, rec.times, inf.source, mut.rate, equi.pop=1000, shape=flat,
                    inoc.size=10, imp.var=25, samples.per.time=5, samp.schedule="random", 
                    samp.freq=500, full=FALSE, feedback=100, glen=100000, 
                    ref.strain=NULL)

sampledata <- W$sampledata
epidata <- W$epidata

# Calculate distance matrix for observed samples
distmat <- gd(sampledata[,3], W$libr, W$nuc, W$librstrains)

# Now pick colors for sampled isolates
colvec <- rainbow(1200)[1:1000] # Color palette
refnode <- 1 # Compare distance to which isolate?
colv <- NULL # Vector of colors for samples
maxD <- max(distmat[,refnode])

for (i in 1:nrow(sampledata)) {
  colv <- c(colv, 
            colvec[floor((length(colvec)-1)*(distmat[refnode,i])/maxD)+1])
}


plotoutbreak(epidata, sampledata, col=colv, stack=TRUE, arr.len=0.1, 
             blockheight=0.5, hspace=60, label.pos="left", block.col="grey", 
             jitter=0.004, xlab="Time", pch=1)

Run the code above in your browser using DataLab