Generate a random timetree according to a homogenous birth-death-sampling model with arbitrary time-varying speciation/extinction/sampling rates. Lineages split (speciate) or die (go extinct) at Poissonian rates and independently of each other. Lineages are sampled continuously (i.e., at Poissonian rates) in time and/or during concentrated sampling attempts (i.e., at specific time points). Sampled lineages are assumed to continue in the pool of extant lineages at some given "retention probability". The final tree can be restricted to sampled lineages only, but may optionally include extant (non-sampled) as well as extinct lineages. Speciation, extinction and sampling rates as well as retention probabilities may depend on time. This function may be used to simulate trees commonly encountered in viral epidemiology, where sampled patients are assumed to exit the pool of infectious individuals.
generate_tree_hbds( max_sampled_tips = NULL,
max_sampled_nodes = NULL,
max_extant_tips = NULL,
max_extinct_tips = NULL,
max_tips = NULL,
max_time = NULL,
include_extant = FALSE,
include_extinct = FALSE,
as_generations = FALSE,
time_grid = NULL,
lambda = NULL,
mu = NULL,
psi = NULL,
kappa = NULL,
splines_degree = 1,
CSA_times = NULL,
CSA_probs = NULL,
CSA_kappas = NULL,
no_full_extinction = FALSE,
tip_basename = "",
node_basename = NULL,
edge_basename = NULL,
include_birth_times = FALSE,
include_death_times = FALSE)
Integer, maximum number of sampled tips. The simulation is halted once this number is reached. If NULL
or <=0, this halting criterion is ignored.
Integer, maximum number of sampled nodes, i.e., of lineages that were sampled but kept in the pool of extant lineages. The simulation is halted once this number is reached. If NULL
or <=0, this halting criterion is ignored.
Integer, maximum number of extant tips. The simulation is halted once the number of concurrently extant tips reaches this threshold. If NULL
or <=0, this halting criterion is ignored.
Integer, maximum number of extant tips. The simulation is halted once this number is reached. If NULL
or <=0, this halting criterion is ignored.
Integer, maximum number of tips (extant+extinct+sampled). The simulation is halted once this number is reached. If NULL
or <=0, this halting criterion is ignored.
Numeric, maximum duration of the simulation. If NULL
or <=0, this halting criterion is ignored.
Logical, specifying whether to include extant tips (i.e., neither extinct nor sampled) in the final tree.
Logical, specifying whether to include extant tips (i.e., neither extant nor sampled) in the final tree.
Logical, specifying whether edge lengths should correspond to generations. If FALSE
, then edge lengths correspond to time. If TRUE
, then the time between two subsequent events (speciation, extinction, sampling) is counted as "one generation".
Numeric vector, specifying time points (in ascending order) on which the rates lambda
, mu
and psi
are provided. Rates are interpolated polynomially between time grid points as needed (according to splines_degree).
The time grid should generally cover the maximum possible simulation time, otherwise it will be polynomially extrapolated as needed.
Numeric vector, of the same size as time_grid
(or size 1 if time_grid==NULL
), listing per-lineage speciation (birth) rates (\(\lambda\), in units 1/time) at the times listed in time_grid
. Speciation rates must be non-negative, and are assumed to vary as a spline between grid points (see argument splines_degree
). Can also be a single numeric, in which case \(\lambda\) is assumed to be constant over time.
Numeric vector, of the same size as time_grid
(or size 1 if time_grid==NULL
), listing per-lineage extinction (death) rates (\(\mu\), in units 1/time) at the times listed in time_grid
. Extinction rates must be non-negative, and are assumed to vary as a spline between grid points (see argument splines_degree
). Can also be a single numeric, in which case \(\mu\) is assumed to be constant over time. If omitted, the extinction rate is assumed to be zero.
Numeric vector, of the same size as time_grid
(or size 1 if time_grid==NULL
), listing per-lineage sampling rates (\(\psi\), in units 1/time) at the times listed in time_grid
. Sampling rates must be non-negative, and are assumed to vary as a spline between grid points (see argument splines_degree
). Can also be a single numeric, in which case \(\psi\) is assumed to be constant over time. If omitted, the continuous sampling rate is assumed to be zero.
Numeric vector, of the same size as time_grid
(or size 1 if time_grid==NULL
), listing retention probabilities (\(\kappa\), unitless) of continuously (Poissonian) sampled lineages at the times listed in time_grid
. Retention probabilities must be true probabilities (i.e., between 0 and 1), and are assumed to vary as a spline between grid points (see argument splines_degree
). Can also be a single numeric, in which case \(\kappa\) is assumed to be constant over time. If omitted, the retention probability is assumed to be zero (a common assumption in epidemiology).
Integer, either 0,1,2 or 3, specifying the polynomial degree of the provided lambda
, mu
and psi
between grid points in age_grid
. For example, if splines_degree==1
, then the provided lambda
, mu
and psi
are interpreted as piecewise-linear curves; if splines_degree==2
the lambda
, mu
and psi
are interpreted as quadratic splines; if splines_degree==3
the lambda
, mu
and psi
is interpreted as cubic splines. If your age_grid
is fine enough, then splines_degree=1
is usually sufficient.
Optional numeric vector, listing times of concentrated sampling attempts, in ascending order. Concentrated sampling is performed in addition to any continuous (Poissonian) sampling specified by psi
.
Optional numeric vector of the same size as CSA_times
, listing sampling probabilities at each concentrated sampling time. Note that in contrast to the sampling rates psi
, the CSA_probs
are interpreted as probabilities and must thus be between 0 and 1. CSA_probs
must be provided if and only if CSA_times
is provided.
Optional numeric vector of the same size as CSA_times
, listing sampling retention probabilities at each concentrated sampling time, i.e. the probability at which a sampled lineage is kept in the pool of extant lineages. Note that the CSA_kappas
are probabilities and must thus be between 0 and 1. CSA_kappas
must be provided if and only if CSA_times
is provided.
Logical, specifying whether to prevent complete extinction of the tree. Full extinction is prevented by temporarily disabling extinctions whenever the number of extant tips is 1. Note that, strictly speaking, the trees generated do not exactly follow the proper probability distribution when no_full_extinction
is TRUE
.
Character. Prefix to be used for tip labels (e.g. "tip."). If empty (""), then tip labels will be integers "1", "2" and so on.
Character. Prefix to be used for node labels (e.g. "node."). If NULL
, no node labels will be included in the tree.
Character. Prefix to be used for edge labels (e.g. "edge."). Edge labels (if included) are stored in the character vector edge.label
. If NULL
, no edge labels will be included in the tree.
Logical. If TRUE
, then the times of speciation events (in order of occurrence) will also be returned.
Logical. If TRUE
, then the times of extinction events (in order of occurrence) will also be returned.
A named list with the following elements:
Logical, indicating whether the simulation was successful. If FALSE
, then the returned list includes an additional `error
' element (character) providing a description of the error; all other return variables may be undefined.
The generated timetree, of class "phylo". Note that this tree need not be ultrametric, for example if sampling occurs at multiple time points.
Numeric, giving the time at which the tree's root was first split during the simulation. Note that this may be greater than 0, i.e., if the tips of the final tree do not coalesce all the way back to the simulation's start.
Numeric, giving the final time at the end of the simulation.
Integer, the total number of speciation (birth) events that occured during the simulation.
Integer, the total number of extinction (death) events that occured during the simulation.
Integer, the total number of sampling events that occured during the simulation.
Integer, the total number of sampling events that occured during the simulation and for which lineages were kept in the pool of extant lineages.
Integer vector, specifying indices (from 1 to Ntips+Nnodes) of sampled tips and nodes in the final tree (regardless of whether their lineages were subsequently retained or removed from the pool).
Integer vector, specifying indices (from 1 to Ntips+Nnodes) of sampled tips and nodes in the final tree that were retained, i.e., not removed from the pool following sampling.
Integer vector, specifying indices (from 1 to Ntips) of extant (non-sampled and non-extinct) tips in the final tree. Will be empty if include_extant==FALSE
.
Integer vector, specifying indices (from 1 to Ntips) of extinct (non-sampled and non-extant) tips in the final tree. Will be empty if include_extinct==FALSE
.
The simulation proceeds in forward time, starting with a single root. Speciation/extinction and continuous (Poissonian) sampling events are drawn at exponentially distributed time steps, according to the rates specified by lambda
, mu
and psi
. Sampling also occurs at the optional CSA_times
. Only extant lineages are sampled at any time point, and sampled lineages are removed from the pool of extant lineages at probability 1-kappa
.
The simulation halts as soon as one of the halting criteria are met, as specified by the options max_sampled_tips
, max_sampled_nodes
, max_extant_tips
, max_extinct_tips
, max_tips
and max_time
, or if no extant tips remain, whichever occurs first. Note that in some scenarios (e.g., if extinction rates are very high) the simulation may halt too early and the generated tree may only contain a single tip (i.e., the root lineage); in that case, the simulation will return an error (see return value success
).
The function returns a single generated tree, as well as supporting information such as which tips are extant, extinct or sampled.
T. Stadler (2010). Sampling-through-time in birth--death trees. Journal of Theoretical Biology. 267:396-404.
T. Stadler et al. (2013). Birth--death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). PNAS. 110:228-233.
generate_tree_hbd_reverse
,
generate_gene_tree_msc
,
generate_random_tree
,
fit_hbds_model_parametric
,
simulate_deterministic_hbds
# NOT RUN {
# define time grid on which lambda, mu and psi will be specified
time_grid = seq(0,100,length.out=1000)
# specify the time-dependent extinction rate mu on the time-grid
mu_grid = 0.5*time_grid/(10+time_grid)
# define additional concentrated sampling attempts
CSA_times = c(5,7,9)
CSA_probs = c(0.5, 0.5, 0.5)
CSA_kappas = c(0.2, 0.1, 0.1)
# generate tree with a constant speciation & sampling rate,
# time-variable extinction rate and additional discrete sampling points
# assuming that all continuously sampled lineages are removed from the pool
simul = generate_tree_hbds( max_time = 10,
include_extant = FALSE,
include_extinct = FALSE,
time_grid = time_grid,
lambda = 1,
mu = mu_grid,
psi = 0.1,
kappa = 0,
CSA_times = CSA_times,
CSA_probs = CSA_probs,
CSA_kappas = CSA_kappas);
if(!simul$success){
cat(sprintf("ERROR: Could not simulate tree: %s\n",simul$error))
}else{
# simulation succeeded. print some basic info about the generated tree
tree = simul$tree
cat(sprintf("Generated tree has %d tips\n",length(tree$tip.label)))
}
# }
Run the code above in your browser using DataLab