Last chance! 50% off unlimited learning
Sale ends in
Given an ultrametric timetree, fit a homogenous birth-death (HBD) model in which speciation and extinction rates (
fit_hbd_model_on_grid(tree,
oldest_age = NULL,
age0 = 0,
age_grid = NULL,
min_lambda = 0,
max_lambda = +Inf,
min_mu = 0,
max_mu = +Inf,
min_rho0 = 1e-10,
max_rho0 = 1,
guess_lambda = NULL,
guess_mu = NULL,
guess_rho0 = 1,
fixed_lambda = NULL,
fixed_mu = NULL,
fixed_rho0 = NULL,
const_lambda = FALSE,
const_mu = FALSE,
splines_degree = 1,
condition = "auto",
relative_dt = 1e-3,
Ntrials = 1,
Nthreads = 1,
max_model_runtime = NULL,
fit_control = list())
A list with the following elements:
Logical, indicating whether model fitting succeeded. If FALSE
, the returned list will include an additional ``error'' element (character) providing a description of the error; in that case all other return variables may be undefined.
The maximized fitting objective. Currently, only maximum-likelihood estimation is implemented, and hence this will always be the maximized log-likelihood.
The name of the objective that was maximized during fitting. Currently, only maximum-likelihood estimation is implemented, and hence this will always be ``loglikelihood''.
The log-likelihood of the fitted model for the given timetree.
Numeric vector of size Ngrid, listing fitted or fixed speciation rates splines_degree
; to evaluate this function at arbitrary ages use the castor
routine evaluate_spline
.
Numeric vector of size Ngrid, listing fitted or fixed extinction rates splines_degree
; to evaluate this function at arbitrary ages use the castor
routine evaluate_spline
.
Numeric, specifying the fitted or fixed sampling fraction
Numeric vector of size Ngrid, specifying the initial guess for
Numeric vector of size Ngrid, specifying the initial guess for
Numeric, specifying the initial guess for
The age-grid on which age_grid
, unless the latter was NULL
or of length <=1.
Integer, number of free (i.e., independently) fitted parameters. If none of the const_lambda=FALSE
and const_mu=FALSE
, then NFP
will be equal to 2*Ngrid+1.
The Akaike Information Criterion for the fitted model, defined as
The Bayesian information criterion for the fitted model, defined as
Character, specifying what conditioning was root for the likelihood (e.g. "crown" or "stem").
Logical, specifying whether the maximum likelihood was reached after convergence of the optimization algorithm. Note that in some cases the maximum likelihood may have been achieved by an optimization path that did not yet converge (in which case it's advisable to increase iter.max
and/or eval.max
).
Integer, specifying the number of iterations performed during the optimization path that yielded the maximum likelihood.
Integer, specifying the number of likelihood evaluations performed during the optimization path that yielded the maximum likelihood.
A rooted ultrametric timetree of class "phylo", representing the time-calibrated reconstructed phylogeny of a set of extant sampled species.
Strictly positive numeric, specifying the oldest time before present (``age'') to consider when calculating the likelihood. If this is equal to or greater than the root age, then oldest_age
is taken as the stem age, and the classical formula by Morlon et al. (2011) is used. If oldest_age
is less than the root age, the tree is split into multiple subtrees at that age by treating every edge crossing that age as the stem of a subtree, and each subtree is considered an independent realization of the HBD model stemming at that age. This can be useful for avoiding points in the tree close to the root, where estimation uncertainty is generally higher. If oldest_age==NULL
, it is automatically set to the root age.
Non-negative numeric, specifying the youngest age (time before present) to consider for fitting, and with respect to which rho
is defined. If age0>0
, then rho0
refers to the sampling fraction at age age0
, i.e. the fraction of lineages extant at age0
that are included in the tree. See below for more details.
Numeric vector, listing ages in ascending order, on which age0
. If splines_degree>0
(see option below) then the age grid must also cover oldest_age
. If NULL
or of length <=1 (regardless of value), then
Numeric vector of length Ngrid (=max(1,length(age_grid))
), or a single numeric, specifying lower bounds for the fitted
Numeric vector of length Ngrid, or a single numeric, specifying upper bounds for the fitted +Inf
to omit upper bounds.
Numeric vector of length Ngrid, or a single numeric, specifying lower bounds for the fitted
Numeric vector of length Ngrid, or a single numeric, specifying upper bounds for the fitted +Inf
to omit upper bounds.
Numeric, specifying a lower bound for the fitted sampling fraction
Numeric, specifying an upper bound for the fitted sampling fraction
Initial guess for NULL
(an initial guess will be computed automatically), or a single numeric (guessing the same NA
. Guess values are ignored for non-fitted (i.e., fixed) parameters.
Initial guess for NULL
(an initial guess will be computed automatically), or a single numeric (guessing the same NA
. Guess values are ignored for non-fitted (i.e., fixed) parameters.
Numeric, specifying an initial guess for the sampling fraction age0
. Setting this to NULL
or NA
is the same as setting it to 1.
Optional fixed (i.e. non-fitted) NULL
(NA
for non-fixed values).
Optional fixed (i.e. non-fitted) NULL
(NA
for non-fixed values).
Numeric between 0 and 1, optionallly specifying a fixed value for the sampling fraction NULL
or NA
, the sampling fraction
Logical, specifying whether const_lambda=TRUE
reduces the number of free (i.e., independently fitted) parameters.
If fixed_lambda
), then only the non-fixed lambdas are assumed to be identical to one another.
Logical, specifying whether const_mu=TRUE
reduces the number of free (i.e., independently fitted) parameters. If fixed_mu
), then only the non-fixed mus are assumed to be identical to one another.
Integer between 0 and 3 (inclusive), specifying the polynomial degree of splines_degree
influences the analytical properties of the curve, e.g. splines_degree==1
guarantees a continuous curve, splines_degree==2
guarantees a continuous curve and continuous derivative, and so on. A degree of 0 is generally not recommended, despite the fact that it has been historically popular. The case splines_degree=0
is also known as ``skyline'' model.
Character, either "crown", "stem", "auto", "stemN" or "crownN" (where N is an integer >=2), specifying on what to condition the likelihood. If "crown", the likelihood is conditioned on the survival of the two daughter lineages branching off at the root at that time. If "stem", the likelihood is conditioned on the survival of the stem lineage, with the process having started at oldest_age
. Note that "crown" and "crownN"" really only make sense when oldest_age
is equal to the root age, while "stem" is recommended if oldest_age
differs from the root age. If "stem2", the condition is that the process yielded at least two sampled tips, and similarly for "stem3" etc. If "crown3", the condition is that a splitting occurred at the root age, both child clades survived, and in total yielded at least 3 sampled tips (and similarly for "crown4" etc). If "auto", the condition is chosen according to the recommendations mentioned earlier. "none" is generally not recommended.
Strictly positive numeric (unitless), specifying the maximum relative time step allowed for integration over time, when calculating the likelihood. Smaller values increase integration accuracy but increase computation time. Typical values are 0.0001-0.001. The default is usually sufficient.
Integer, specifying the number of independent fitting trials to perform, each starting from a random choice of model parameters. Increasing Ntrials
reduces the risk of reaching a non-global local maximum in the fitting objective.
Integer, specifying the number of parallel threads to use for performing multiple fitting trials simultaneously. This should generally not exceed the number of available CPUs on your machine. Parallel computing is not available on the Windows platform.
Optional numeric, specifying the maximum number of seconds to allow for each evaluation of the likelihood function. Use this to abort fitting trials leading to parameter regions where the likelihood takes a long time to evaluate (these are often unlikely parameter regions).
Named list containing options for the nlminb
optimization routine, such as iter.max
, eval.max
or rel.tol
. For a complete list of options and default values see the documentation of nlminb
in the stats
package.
Stilianos Louca
Warning: Unless well-justified constraints are imposed on either fit_hbd_pdr_on_grid
and fit_hbd_psr_on_grid
instead.
If age0>0
, the input tree is essentially trimmed at age0
(omitting anything younger than age0
), and the various variables are fitted to this new (shorter) tree, with time shifted appropriately. For example, the fitted rho0
is thus the sampling fraction at age0
, i.e. the fraction of lineages extant at age0
that are represented in the timetree.
It is generally advised to provide as much information to the function fit_hbd_model_on_grid
as possible, including reasonable lower and upper bounds (min_lambda
, max_lambda
, min_mu
, max_mu
, min_rho0
and max_rho0
) and a reasonable parameter guess (guess_lambda
, guess_mu
and guess_rho0
). It is also important that the age_grid
is sufficiently fine to capture the expected major variations of age_grid
is too fine and/or the tree is too small.
T. Stadler (2009). On incomplete sampling under birth-death models and connections to the sampling-based coalescent. Journal of Theoretical Biology. 261:58-66.
T. Stadler (2013). How can we improve accuracy of macroevolutionary rate estimates? Systematic Biology. 62:321-329.
H. Morlon, T. L. Parsons, J. B. Plotkin (2011). Reconciling molecular phylogenies with the fossil record. Proceedings of the National Academy of Sciences. 108:16327-16332.
S. Louca et al. (2018). Bacterial diversification through geological time. Nature Ecology & Evolution. 2:1458-1467.
S. Louca and M. W. Pennell (2020). Extant timetrees are consistent with a myriad of diversification histories. Nature. 580:502-505.
simulate_deterministic_hbd
loglikelihood_hbd
fit_hbd_model_parametric
fit_hbd_pdr_on_grid
fit_hbd_pdr_parametric
fit_hbd_psr_on_grid
if (FALSE) {
# Generate a random tree with exponentially varying lambda & mu
Ntips = 10000
rho = 0.5 # sampling fraction
time_grid = seq(from=0, to=100, by=0.01)
lambdas = 2*exp(0.1*time_grid)
mus = 1.5*exp(0.09*time_grid)
sim = generate_random_tree( parameters = list(rarefaction=rho),
max_tips = Ntips/rho,
coalescent = TRUE,
added_rates_times = time_grid,
added_birth_rates_pc = lambdas,
added_death_rates_pc = mus)
tree = sim$tree
root_age = castor::get_tree_span(tree)$max_distance
cat(sprintf("Tree has %d tips, spans %g Myr\n",length(tree$tip.label),root_age))
# Fit mu on grid
# Assume that lambda & rho are known
Ngrid = 5
age_grid = seq(from=0,to=root_age,length.out=Ngrid)
fit = fit_hbd_model_on_grid(tree,
age_grid = age_grid,
max_mu = 100,
fixed_lambda= approx(x=time_grid,y=lambdas,xout=sim$final_time-age_grid)$y,
fixed_rho0 = rho,
condition = "crown",
Ntrials = 10, # perform 10 fitting trials
Nthreads = 2, # use two CPUs
max_model_runtime = 1) # limit model evaluation to 1 second
if(!fit$success){
cat(sprintf("ERROR: Fitting failed: %s\n",fit$error))
}else{
cat(sprintf("Fitting succeeded:\nLoglikelihood=%g\n",fit$loglikelihood))
# plot fitted & true mu
plot( x = fit$age_grid,
y = fit$fitted_mu,
main = 'Fitted & true mu',
xlab = 'age',
ylab = 'mu',
type = 'b',
col = 'red',
xlim = c(root_age,0))
lines(x = sim$final_time-time_grid,
y = mus,
type = 'l',
col = 'blue');
# get fitted mu as a function of age
mu_fun = approxfun(x=fit$age_grid, y=fit$fitted_mu)
}
}
Run the code above in your browser using DataLab