chronopl: Molecular Dating With Penalized Likelihood

Description

This function estimates the node ages of a tree using a semi-parametric method based on penalized likelihood (Sanderson 2002). The branch lengths of the input tree are interpreted as mean numbers of substitutions (i.e., per site).

Usage

chronopl(phy, lambda, age.min = 1, age.max = NULL,
         node = "root", S = 1, tol = 1e-8,
         CV = FALSE, eval.max = 500, iter.max = 500, ...)

Arguments

Value

an object of class "phylo" with branch lengths as estimated by the function. There are three or four further attributes:ploglikthe maximum penalized log-likelihood.ratesthe estimated rates for each branch.messagethe message returned by nlminb indicating whether the optimisation converged.D2the influence of each observation on overall date estimates (if CV = TRUE).

Details

The idea of this method is to use a trade-off between a parametric formulation where each branch has its own rate, and a nonparametric term where changes in rates are minimized between contiguous branches. A smoothing parameter (lambda) controls this trade-off. If lambda = 0, then the parametric component dominates and rates vary as much as possible among branches, whereas for increasing values of lambda, the variation are smoother to tend to a clock-like model (same rate for all branches).

lambda must be given. The known ages are given in age.min, and the correponding node numbers in node. These two arguments must obviously be of the same length. By default, an age of 1 is assumed for the root, and the ages of the other nodes are estimated.

If age.max = NULL (the default), it is assumed that age.min gives exactly known ages. Otherwise, age.max and age.min must be of the same length and give the intervals for each node. Some node may be known exactly while the others are known within some bounds: the values will be identical in both arguments for the former (e.g., age.min = c(10, 5), age.max = c(10, 6), node = c(15, 18) means that the age of node 15 is 10 units of time, and the age of node 18 is between 5 and 6).

If two nodes are linked (i.e., one is the ancestor of the other) and have the same values of age.min and age.max (say, 10 and 15) this will result in an error because the medians of these values are used as initial times (here 12.5) giving initial branch length(s) equal to zero. The easiest way to solve this is to change slightly the given values, for instance use age.max = 14.9 for the youngest node, or age.max = 15.1 for the oldest one (or similarly for age.min).

The input tree may have multichotomies. If some internal branches are of zero-length, they are collapsed (with a warning), and the returned tree will have less nodes than the input one. The presence of zero-lengthed terminal branches of results in an error since it makes little sense to have zero-rate branches.

The cross-validation used here is different from the one proposed by Sanderson (2002). Here, each tip is dropped successively and the analysis is repeated with the reduced tree: the estimated dates for the remaining nodes are compared with the estimates from the full data. For the $i$th tip the following is calculated:

$$\sum_{j=1}^{n-2}{\frac{(t_j - t_j^{-i})^2}{t_j}}$$,

where $t_j$ is the estimated date for the $j$th node with the full phylogeny, $t_j^{-i}$ is the estimated date for the $j$th node after removing tip $i$ from the tree, and $n$ is the number of tips.

The present version uses the nlminb to optimise the penalized likelihood function: see its help page for details on parameters controlling the optimisation procedure.

References