fit_and_compare_bm_models: Fit and compare Brownian Motion models for multivariate trait evolution between two data sets.

Description

Given two rooted phylogenetic trees and states of one or more continuous (numeric) traits on the trees' tips, fit a multivariate Brownian motion model of correlated evolution to each data set and compare the fitted models. This function estimates the diffusivity matrix for each data set (i.e., each tree/tip-states set) via maximum-likelihood and assesses whether the log-difference between the two fitted diffusivity matrixes is statistically significant, under the null hypothesis that the two data sets exhibit the same diffusivity. Optionally, multiple trees can be used as input for each data set, under the assumption that the trait evolved on each tree according to the same BM model. For more details on how BM is fitted to each data set see the function fit_bm_model.

Usage

fit_and_compare_bm_models(  trees1, 
                            tip_states1,
                            trees2,
                            tip_states2,
                            Nbootstraps     = 0,
                            Nsignificance   = 0,
                            check_input     = TRUE,
                            verbose         = FALSE,
                            verbose_prefix  = "")

Arguments

trees1

Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the first data set on which a BM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance.

tip_states1

Numeric state of each trait at each tip in each tree in the first data set. If trees1 is a single tree, then tip_states1 must either be a numeric vector of size Ntips or a 2D numeric matrix of size Ntips x Ntraits, listing the trait states for each tip in the tree. If trees1 is a list of Ntrees trees, then tip_states1 must be a list of length Ntrees, each element of which lists the trait states for the corresponding tree (as a vector or 2D matrix, similarly to the single-tree case).

trees2

Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the second data set on which a BM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance.

tip_states2

Numeric state of each trait at each tip in each tree in the second data set, similarly to tip_states1.

Nbootstraps

Integer, specifying the number of parametric bootstraps to perform for calculating the confidence intervals of BM diffusivities fitted to each data set. If <=0, no bootstrapping is performed.

Nsignificance

Integer, specifying the number of simulations to perform for assessing the statistical significance of the log-transformed difference between the diffusivities fitted to the two data sets, i.e. of $| \log (D_{1}) - \log (D_{2}) |$ . Set to 0 to not calculate the significance of the slope. See below for additional details.

check_input

Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to FALSE to reduce computation.

verbose

Logical, specifying whether to print progress report messages to the screen.

verbose_prefix

Character, specifying a prefix to include in front of progress report messages on each line. Only relevant if verbose==TRUE.

Value

A list with the following elements:

success

Logical, indicating whether the fitting was successful for both data sets. If FALSE, then an additional return variable, error, will contain a description of the error; in that case all other return variables may be undefined.

fit1

A named list containing the fitting results for the first data set, in the same format as returned by fit_bm_model. In particular, the diffusivity fitted to the first data set will be stored in fit1$diffusivity.

fit2

A named list containing the fitting results for the second data set, in the same format as returned by fit_bm_model. In particular, the diffusivity fitted to the second data set will be stored in fit1$diffusivity.

log_difference

The absolute difference between the log-transformed diffusivities, i.e. $| \log (D_{1}) - \log (D_{2}) |$ . In the multivariate case, this will be a matrix of size Ntraits x Ntraits.

significance

Numeric, statistical significance of the observed log-difference under the null hypothesis that the two data sets were generated by a common BM model. Only returned if Nsignificance>0.

fit_common

A named list containing the fitting results for the two data sets combined, in the same format as returned by fit_bm_model. The common diffusivity, fit_common$diffusivity is used for the random simulations when assessing the statistical significance of the log-difference of the separately fitted diffusivities. Only returned if Nsignificance>0.

Details

For details on the Brownian Motion model see fit_bm_model and simulate_bm_model. This function separately fits a single-variate or multi-variate BM model with constant diffusivity (diffusivity matrix, in the multivariate case) to each data set; internally, this function applies fit_bm_model to each data set.

If Nsignificance>0, the statistical significance of the log-transformed difference of the two fitted diffusivity matrixes, $| \log (D_{1}) - \log (D_{2}) |$ , is assessed, under the null hypothesis that both data sets were generated by the same common BM model. The diffusivity of this common BM model is estimated by fitting to both datasets at once, i.e. after merging the two datasets into a single dataset of trees and tip states (see return variable fit_common below). For each of the Nsignificance random simulations of the common BM model on the two tree sets, the diffusivities are again separately fitted on the two simulated sets and the resulting log-difference is compared to the one of the original data sets. The returned significance is the probability that the diffusivities would have a log-difference larger than the observed one, if the two data sets had been generated under the common BM model.

If tree$edge.length is missing, each edge in the tree is assumed to have length 1. The tree may include multifurcations (i.e. nodes with more than 2 children) as well as monofurcations (i.e. nodes with only one child). Note that multifurcations are internally expanded to bifurcations, prior to model fitting.

References

J. Felsenstein (1985). Phylogenies and the Comparative Method. The American Naturalist. 125:1-15.

Examples

Run this code

# NOT RUN {
# simulate distinct BM models on two random trees
D1 = 1
D2 = 2
tree1 = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree
tree2 = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree
tip_states1 = simulate_bm_model(tree1, diffusivity = D1)$tip_states
tip_states2 = simulate_bm_model(tree2, diffusivity = D2)$tip_states

# fit and compare BM models between the two data sets
fit = fit_and_compare_bm_models(trees1          = tree1, 
                                tip_states1     = tip_states1, 
                                trees2          = tree2,
                                tip_states2     = tip_states2,
                                Nbootstraps     = 100,
                                Nsignificance   = 100)

# print summary of results
cat(sprintf("Fitted D1 = %g, D2 = %g, significance of log-diff. = %g\n",
            fit$fit1$diffusivity, fit$fit2$diffusivity, fit$significance))
# }

Run the code above in your browser using DataLab

Get 50% off unlimited learning