Given a homogenous birth-death-sampling (HBDS) model, i.e., with speciation rate \(\lambda\), extinction rate \(\mu\), continuous (Poissonian) sampling rate \(\psi\) and retention probability \(\kappa\), calculate various deterministic features of the model backwards in time, such as the total population size and the LTT over time. Continuously sampled lineages are kept in the pool of extant lineages at probability \(\kappa\). The variables \(\lambda\), \(\mu\), \(\psi\) and \(\kappa\) may depend on time. In addition, the model can include concentrated sampling attempts at a finite set of discrete time points \(t_1,..,t_m\). ``Homogenous'' refers to the assumption that, at any given moment in time, all lineages exhibit the same speciation/extinction/sampling rates and retention proabbility. Every HBDS model is thus defined based on the values that \(\lambda\), \(\mu\), \(\psi\) and \(\kappa\) take over time, as well as the sampling probabilities \(\psi_1,..,\psi_m\) and retention probabilities \(\kappa_1,..,\kappa_m\) during the concentrated sampling attempts. Special cases of this model are sometimes known as ``birth-death-skyline plots'' in the literature (Stadler 2013). In epidemiology, these models are often used to describe the phylogenies of viral strains sampled over the course of the epidemic. A ``concentrated sampling attempt'' is a brief but intensified sampling period that lasted much less than the typical timescales of speciation/extinction. ``Deterministic'' refers to the fact that all calculated properties are completely determined by the model's parameters (i.e. non-random), as if an infinitely large tree was generated (aka. ``continuum limit''). The time-profiles of \(\lambda\), \(\mu\), \(\psi\) and \(\kappa\) are specified as piecewise polynomial curves (splines), defined on a discrete grid of ages.
simulate_deterministic_hbds(age_grid = NULL,
lambda = NULL,
mu = NULL,
psi = NULL,
kappa = NULL,
splines_degree = 1,
CSA_ages = NULL,
CSA_probs = NULL,
CSA_kappas = NULL,
requested_ages = NULL,
age0 = 0,
N0 = NULL,
LTT0 = NULL,
ODE_relative_dt = 0.001,
ODE_relative_dy = 1e-4)
Numeric vector, listing discrete ages (time before present) on which either \(\lambda\) and \(\mu\), or the PDR and \(\mu\), are specified. Listed ages must be strictly increasing, and must cover at least the full considered age interval (from age0
to oldest_age
). Can also be NULL
or a vector of size 1, in which case the speciation rate, extinction rate and PDR are assumed to be time-independent.
Numeric vector, of the same size as age_grid
(or size 1 if age_grid==NULL
), listing speciation rates (\(\lambda\), in units 1/time) at the ages listed in age_grid
. Speciation rates should be non-negative, and are assumed to vary polynomially between grid points (see argument splines_degree
).
Numeric vector, of the same size as age_grid
(or size 1 if age_grid==NULL
), listing extinction rates (\(\mu\), in units 1/time) at the ages listed in age_grid
. Extinction rates should be non-negative, and are assumed to vary polynomially between grid points (see argument splines_degree
).
Numeric vector, of the same size as age_grid
(or size 1 if age_grid==NULL
), listing the continuous (Poissonian) sampling rate at the ages listed in age_grid
. Sampling rates should be non-negative, and are assumed to vary polynomially between grid points (see argument splines_degree
).
Numeric vector, of the same size as age_grid
(or size 1 if age_grid==NULL
), listing the retention probabilities following Poissonian sampling events, at the ages listed in age_grid
. The listed values must be true probabilities, i.e. between 0 and 1, and are assumed to vary polynomially between grid points (see argument splines_degree
). The retention probability is the probability that a continuously sampled lineage remains in the pool of extant lineages. Note that many epidemiological models assume kappa to be zero.
Integer, either 0,1,2 or 3, specifying the polynomial degree of the provided lambda
, mu
, psi
and kappa
between grid points in age_grid
. For example, if splines_degree==1
, then the provided lambda
, mu
, psi
and kappa
are interpreted as piecewise-linear curves; if splines_degree==2
they are interpreted as quadratic splines; if splines_degree==3
they are interpreted as cubic splines. The splines_degree
influences the analytical properties of the curve, e.g. splines_degree==1
guarantees a continuous curve, splines_degree==2
guarantees a continuous curve and continuous derivative, and so on.
Optional numeric vector, listing the ages of concentrated sampling attempts, in ascending order. Concentrated sampling is performed in addition to any continuous (Poissonian) sampling specified by psi
.
Optional numeric vector of the same size as CSA_ages
, listing sampling probabilities at each concentrated sampling attempt. Note that in contrast to the sampling rates psi
, the CSA_probs
are interpreted as probabilities and must thus be between 0 and 1. CSA_probs
must be provided if and only if CSA_ages
is provided.
Optional numeric vector of the same size as CSA_ages
, listing retention probabilities at each concentrated sampling event, i.e. the probability at which a sampled lineage is kept in the pool of extant lineages. Note that the CSA_kappas
are probabilities and must thus be between 0 and 1. CSA_kappas
must be provided if and only if CSA_ages
is provided.
Optional numeric vector, listing ages (in ascending order) at which the various model variables are requested. If NULL
, it will be set to age_grid
.
Non-negative numeric, specifying the age at which LTT0
and pop_size0
are specified. Typically this will be 0, i.e., corresponding to the present.
Positive numeric, specifying the number of extant species (sampled or not) at age0
. Used to determine the "scaling factor" for the returned population sizes and LTT. Either pop_size0
or LTT0
must be provided, but not both.
Positive numeric, specifying the number of lineages present in the tree at age0
. Used to determine the "scaling factor" for the returned population sizes and LTT. Either pop_size0
or LTT0
must be provided, but not both.
Positive unitless number, specifying the default relative time step for internally used ordinary differential equation solvers. Typical values are 0.01-0.001.
Positive unitless number, specifying the relative difference between subsequent simulated and interpolated values, in internally used ODE solvers. Typical values are 1e-2 to 1e-5. A smaller ODE_relative_dy
increases interpolation accuracy, but also increases memory requirements and adds runtime (scaling with the tree's age span, not with Ntips).
A named list with the following elements:
Logical, indicating whether the calculation was successful. If FALSE
, then the returned list includes an additional `error
' element (character) providing a description of the error; all other return variables may be undefined.
Numerical vector of size NG, listing discrete ages (time before present) on which all returned time-curves are specified. Will be equal to requested_ages
, if the latter was provided.
Numerical vector of size NG, listing the predicted (deterministic) total diversity (number of extant species, denoted \(N\)) at the ages given in ages[]
.
Numeric vector of size NG, listing the number of lineages represented in the tree at any given age, also known as ``lineages-through-time'' (LTT) curve. Note that LTT
at age0
will be equal to LTT0
(if the latter was provided), and that values in LTT
will be non-increasing with age.
Numeric vector of size NG, listing the probability that a lineage, extant at a given age, will not be represented in the tree.
Numeric vector of size NG, listing the speciation rates at the ages given in ages[]
.
Numeric vector of size NG, listing the extinctions rates at the ages given in ages[]
.
Numeric vector of size NG, listing the Poissonian sampling rates at the ages given in ages[]
.
Numeric vector of size NG, listing the retention probabilities (for continuously sampled lineages) at the ages given in ages[]
.
Numeric vector of size NG, listing the pulled diversification rate (PDR, in units 1/time) at the ages given in ages[]
.
Numeric vector of size NG, listing the ``pulled speciation rate'' (PSR, in units 1/time) at the ages given in ages[]
. The PSR is defined as \(PSR=\lambda\cdot(1-Pmissing)\).
Numeric vector of size NG, listing the net diversification rate (in units 1/time) at the ages given in ages[]
.
Numeric vector of size NG, listing the product of the speciation rate and Poissonian sampling rate (in units 1/time^2) at the ages given in ages[]
.
Numeric vector of size NG, listing the product of the continuous sampling rate and the continuous retention probability (in units 1/time) at the ages given in ages[]
.
Numeric vector of size NG, listing the basic reproduction rate (\(R_0=\lambda/(\mu+\psi(1-\kappa))\)) at the ages given in ages[]
.
Numeric vector of size NG, listing the pulled concentrated sampling probabilities, \(\tilde{\rho}_k=\rho_k/(1-E)\).
Numeric vector of size NG, listing the continuous (Poissonian) sampling rates during the concentrated sampling attempts, \(\psi(t_1),..,\psi(t_m)\).
Numeric vector of size NG, listing the pulled speciation rates during the concentrated sampling attempts.
The simulated LTT refers to a hypothetical tree sampled at age 0, i.e. LTT(t) will be the number of lineages extant at age t that survived and were sampled until by the present day. Note that if a concentrated sampling attempt occurs at age \(\tau\), then LTT(\(\tau\)) is the number of lineages in the tree right before the occurrence of the sampling attempt, i.e., in the limit where \(\tau\) is approached from above.
Note that in this function age always refers to time before present, i.e., present day age is 0, and age increases towards the root.
T. Stadler, D. Kuehnert, S. Bonhoeffer, A. J. Drummond (2013). Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). PNAS. 110:228-233.
generate_tree_hbds
,
fit_hbds_model_parametric
,
simulate_deterministic_hbd
# NOT RUN {
# define an HBDS model with exponentially decreasing speciation/extinction rates
# and constant Poissonian sampling rate psi
oldest_age= 10
age_grid = seq(from=0,to=oldest_age,by=0.1) # choose a sufficiently fine age grid
lambda = 1*exp(0.01*age_grid) # define lambda on the age grid
mu = 0.2*lambda # assume similarly shaped but smaller mu
# simulate deterministic HBD model
# scale LTT such that it is 100 at age 1
simulation = simulate_deterministic_hbds(age_grid = age_grid,
lambda = lambda,
mu = mu,
psi = 0.1,
age0 = 1,
LTT0 = 100)
# plot deterministic LTT
plot( x = simulation$ages, y = simulation$LTT, type='l',
main='dLTT', xlab='age', ylab='lineages', xlim=c(oldest_age,0))
# }
Run the code above in your browser using DataLab