simulate_dsse: Simulate a Discrete-State Speciation and Extinction (dSSE) model.

Description

Simulate a random phylogenetic tree in forward time based on a Poissonian speciation/extinction (birth/death) process, whereby birth and death rates are determined by a co-evolving discrete trait. New species are added (born) by splitting of a randomly chosen extant tip. The discrete trait, whose values determine birth/death rates, can evolve in two modes: (A) Anagenetically, i.e. according to a discrete-space continuous-time Markov process along each edge, with fixed transition rates between states, and/or (B) cladogenetically, i.e. according to fixed transition probabilities between states at each speciation event. This model class includes the Multiple State Speciation and Extinction (MuSSE) model described by FitzJohn et al. (2009), as well as the Cladogenetic SSE (ClaSSE) model described by Goldberg and Igis (2012). Optionally, the model can be turned into a Hidden State Speciation and Extinction model (Beaulieu and O'meara, 2016), by replacing the simulated tip/node states with "proxy" states, thus hiding the original states actually influencing speciation/extinction rates.

Usage

simulate_dsse( Nstates,
               NPstates             = NULL,
               proxy_map            = NULL,
               parameters           = list(),
               start_state          = NULL,
               max_tips             = NULL, 
               max_time             = NULL,
               max_time_eq          = NULL,
               max_events           = NULL,
               sampling_fractions   = NULL,
               reveal_fractions     = NULL,
               coalescent           = TRUE,
               as_generations       = FALSE,
               no_full_extinction   = TRUE,
               Nsplits              = 2, 
               tip_basename         = "", 
               node_basename        = NULL,
               include_birth_times  = FALSE,
               include_death_times  = FALSE,
               include_rates        = FALSE,
               include_labels       = TRUE)
               
simulate_musse(Nstates,  NPstates = NULL, proxy_map = NULL, 
            parameters = list(), start_state = NULL, 
            max_tips = NULL, max_time = NULL, max_time_eq = NULL, max_events = NULL, 
            sampling_fractions = NULL, reveal_fractions = NULL, coalescent = TRUE, 
            as_generations = FALSE, no_full_extinction = TRUE, Nsplits = 2, 
            tip_basename = "", node_basename = NULL, include_birth_times = FALSE, 
            include_death_times = FALSE, include_rates = FALSE, include_labels = TRUE)

Arguments

Nstates

Integer, specifying the number of possible discrete states a tip can have, influencing speciation/extinction rates. For example, if Nstates==2 then this corresponds to the common Binary State Speciation and Extinction (BiSSE) model (Maddison et al., 2007). In the case of a HiSSE model, Nstates refers to the total number of diversification rate categories, as described by Beaulieu and O'meara (2016).

NPstates

Integer, optionally specifying a number of "proxy-states" that are observed instead of the underlying speciation/extinction-modulating states. To simulate a HiSSE model, this should be smaller than Nstates. Each state corresponds to a different proxy-state, as defined using the variable proxy_map (see below). For BiSSE/MuSSE with no hidden states, NPstates can be set to either NULL or equal to Nstates, and proxy-states are equivalent to states.

proxy_map

Integer vector of size Nstates and with values in 1,..NPstates, specifying the correspondence between states (i.e. diversification-rate categories) and (observed) proxy-states, in a HiSSE model. Specifically, proxy_map[s] indicates which proxy-state the state s is represented by. Each proxy-state can represent multiple states (i.e. proxies are ambiguous), but each state must be represented by exactly one proxy-state. For non-HiSSE models, set this to NULL. See below for more details.

parameters

A named list specifying the dSSE model parameters, such as the anagenetic and/or cladogenetic transition rates between states and the state-dependent birth/death rates (see details below).

start_state

Integer within 1,..,Nstates, specifying the initial state, i.e. of the first lineage created. If left unspecified, this is chosen randomly and uniformly among all possible states.

max_tips

Maximum number of tips in the generated tree, prior to any subsampling. If coalescent=TRUE, this refers to the number of extant tips, prior to subsampling. Otherwise, it refers to the number of extinct + extant tips, prior to subsampling. If NULL or <=0, the number of tips is not limited, so you should use max_time and/or max_time_eq and/or max_events to stop the simulation.

max_time

Numeric, maximum duration of the simulation. If NULL or <=0, this constraint is ignored.

max_time_eq

Numeric, maximum duration of the simulation, counting from the first point at which speciation/extinction equilibrium is reached, i.e. when (birth rate - death rate) changed sign for the first time. If NULL or <0, this constraint is ignored.

max_events

Integer, maximum number of speciation/extinction/transition events before halting the simulation. If NULL, this constraint is ignored.

sampling_fractions

A single number, or a numeric vector of size NPstates, listing tip sub-sampling fractions, depending on proxy-state. sampling_fractions[p] is the probability of including a tip in the final tree, if its proxy-state is p. If NULL, all tips (or all extant tips, if coalescent==TRUE) are included in the tree. If a single number, all tips are included with the same probability, i.e. regardless of their proxy-state.

reveal_fractions

Numeric vector of size NPstates, listing reveal fractions of tip proxy-states, depending on proxy state. reveal_fractions[p] is the probability of knowing a tip's proxy-state, if its proxy state is p. Can also be NULL, in which case all tip proxy states will be known.

coalescent

Logical, specifying whether only the coalescent tree (i.e. the tree spanning the extant tips) should be returned. If coalescent==FALSE and the death rate is non-zero, then the tree may include non-extant tips (i.e. tips whose distance from the root is less than the total time of evolution). In that case, the tree will not be ultrametric.

as_generations

Logical, specifying whether edge lengths should correspond to generations. If FALSE, then edge lengths correspond to time.

no_full_extinction

Logical, specifying whether to prevent complete extinction of the tree. Full extinction is prevented by temporarily disabling extinctions whenever the number of extant tips is 1. if no_full_extinction==FALSE and death rates are non-zero, the tree may go extinct during the simulation; if coalescent==TRUE, then the returned tree would be empty, hence the function will return unsuccessfully (i.e. success will be FALSE). By default no_full_extinction is TRUE, however in some special cases it may be desirable to allow full extinctions to ensure that the generated trees are statistically distributed exactly according to the underlying cladogenetic model.

Nsplits

Integer greater than 1. Number of child-tips to generate at each diversification event. If set to 2, the generated tree will be bifurcating. If >2, the tree will be multifurcating.

tip_basename

Character. Prefix to be used for tip labels (e.g. "tip."). If empty (""), then tip labels will be integers "1", "2" and so on.

node_basename

Character. Prefix to be used for node labels (e.g. "node."). If NULL, no node labels will be included in the tree.

include_birth_times

Logical. If TRUE, then the times of speciation events (in order of occurrence) will also be returned.

include_death_times

Logical. If TRUE, then the times of extinction events (in order of occurrence) will also be returned.

include_rates

Logical. If TRUE, then the per-capita birth & death rates of all tips and nodes will also be returned.

include_labels

Logical, specifying whether to include tip-labels and node-labels (if available) as names in the returned state vectors (e.g. tip_states and node_states). In any case, returned states are always listed in the same order as tips and nodes in the tree. Setting this to FALSE may increase computational efficiency for situations where labels are not required.

Value

A named list with the following elements:

success

Logical, indicating whether the simulation was successful. If FALSE, an additional element error (of type character) is included containing an explanation of the error; in that case the value of any of the other elements is undetermined.

tree

A rooted bifurcating (if Nsplits==2) or multifurcating (if Nsplits>2) tree of class "phylo", generated according to the specified birth/death model.

If coalescent==TRUE or if all death rates are zero, and only if as_generations==FALSE, then the tree will be ultrametric. If as_generations==TRUE and coalescent==FALSE, all edges will have unit length.

root_time

Numeric, giving the time at which the tree's root was first split during the simulation. Note that if coalescent==TRUE, this may be later than the first speciation event during the simulation.

final_time

Numeric, giving the final time at the end of the simulation. If coalescent==TRUE, then this may be greater than the total time span of the tree (since the root of the coalescent tree need not correspond to the first speciation event).

equilibrium_time

Numeric, giving the first time where the sign of (death rate - birth rate) changed from the beginning of the simulation, i.e. when speciation/extinction equilibrium was reached. May be infinite if the simulation stopped before reaching this point.

Nbirths

Numeric vector of size Nstates, listing the total number of birth events (speciations) that occurred at each state. The sum of all entries in Nbirths may be lower than the total number of tips in the tree if death rates were non-zero and coalescent==TRUE, or if Nsplits>2.

Ndeaths

Numeric vector of size Nstates, listing the total number of death events (extinctions) that occurred at each state.

Ntransitions_A

2D numeric matrix of size Nstates x Nstates, listing the total number of anagenetic transition events that occurred between each pair of states. For example, Ntransitions_A[1,2] is the number of anagenetic transitions (i.e., within a species) that occured from state 1 to state 2.

Ntransitions_C

2D numeric matrix of size Nstates x Nstates, listing the total number of cladogenetic transition events that occurred between each pair of states. For example, Ntransitions_C[1,2] is the number of cladogenetic transitions (i.e., from a parent to a child) that occured from state 1 to state 2 during some speciation event. Note that each speciation event will have caused Nsplits transitions, and that the emergence of a child with the same state as the parent is counted as a transition between the same state (diagonal entries in Ntransitions_C).

tip_states

Integer vector of size Ntips and with values in 1,..,Nstates, listing the state of each tip in the tree.

node_states

Integer vector of size Nnodes and with values in 1,..,Nstates, listing the state of each node in the tree.

tip_proxy_states

Integer vector of size Ntips and with values in 1,..,NPstates, listing the proxy state of each tip in the tree. Only included in the case of HiSSE models.

node_proxy_states

Integer vector of size Nnodes and with values in 1,..,NPstates, listing the proxy state of each node in the tree. Only included in the case of HiSSE models.

start_state

Integer, specifying the state of the first lineage (either provided during the function call, or generated randomly).

birth_times

Numeric vector, listing the times of speciation events during tree growth, in order of occurrence. Note that if coalescent==TRUE, then speciation_times may be greater than the phylogenetic distance to the coalescent root.

death_times

Numeric vector, listing the times of extinction events during tree growth, in order of occurrence. Note that if coalescent==TRUE, then speciation_times may be greater than the phylogenetic distance to the coalescent root.

clade_birth_rates

Numeric vector of size Ntips+Nnodes, listing the per-capita birth rate of each tip and node in the tree. Only included if include_rates==TRUE.

clade_death_rates

Numeric vector of size Ntips+Nnodes, listing the per-capita death rate of each tip and node in the tree. Only included if include_rates==TRUE.

Details

The function simulate_dsse can be used to simulate a diversification + discrete-trait evolutionary process, in which birth/death (speciation/extinction) rates at each tip are determined by a tip's current "state". Lineages can transition between states anagenetically along each edge (according to fixed Markov transition rates) and/or cladogenetically at each speciation event (according to fixed transition probabilities).

The function simulate_musse is a simplified variant meant to simulate MuSSE/HiSSE models in the absence of cladogenetic state transitions, and is included mainly for backward-compatibility reasons. The input arguments for simulate_musse are identical to simulate_dsse, with the exception that the parameters argument must include slightly different elements (explained below).

For simulate_dsse, the argument parameters should be a named list including one or more of the following elements:

birth_rates: Numeric vector of size Nstates, listing the per-capita birth rate (speciation rate) at each state. Can also be a single number (all states have the same birth rate).
death_rates: Numeric vector of size Nstates, listing the per-capita death rate (extinction rate) at each state. Can also be a single number (all states have the same death rate).
transition_matrix_A: 2D numeric matrix of size Nstates x Nstates, listing anagenetic transition rates between states along an edge. Hence, transition_matrix_A[r,c] is the probability rate for transitioning from state r to state c. Non-diagonal entries must be non-negative, diagonal entries must be non-positive, and the sum of each row must be zero.
transition_matrix_C: 2D numeric matrix of size Nstates x Nstates, listing cladogenetic transition probabilities between states during a speciation event, seperately for each child. Hence, transition_matrix_C[r,c] is the probability that a child will have state c, conditional upon the occurrence of a speciation event, given that the parent had state r, and independently of all other children. Entries must be non-negative, and the sum of each row must be one.

For simulate_musse, the argument parameters should be a named list including one or more of the following elements:

birth_rates: Same as for simulate_dsse.
death_rates: Same as for simulate_dsse.
transition_matrix: 2D numeric matrix of size Nstates x Nstates, listing anagenetic transition rates between states. This is equivalent to transition_matrix_A in simulate_dsse.

If max_time==NULL and max_time_eq==NULL and max_events==NULL, then the returned tree will always contain max_tips tips. If at any moment during the simulation the tree only includes a single extant tip, and if no_full_extinction=TRUE, the death rate is temporarily set to zero to prevent the complete extinction of the tree. If max_tips==NULL, then the simulation is ran as long as specified by max_time and/or max_time_eq and/or max_events. If neither max_time, max_time_eq, max_tips nor max_events is NULL, then the simulation halts as soon as the time reaches max_time, or the time since equilibration reaches max_time_eq, or the number of tips (extant tips if coalescent is TRUE) reaches max_tips, or the number of speciation/extinction/transition events reaches max_events whichever occurs first. If max_tips!=NULL and Nsplits>2, then the last diversification even may generate fewer than Nsplits children, in order to keep the total number of tips within the specified limit. Note that this code generates trees in forward time, and halts as soon as one of the halting conditions is met; the halting condition chosen affects the precise distribution from which the generated trees are drawn (Stadler 2011).

HiSSE models (Beaulieu and O'meara, 2016) are closely related to BiSSE/MuSSE models, the main difference being the fact that the actual diversification-modulating states are not directly observed. Hence, this function is also able to simulate HiSSE models, with appropriate choice of the input variables Nstates, NPstates and proxy_map. For example, Nstates=4, NPstates=2 and proxy_map=c(1,2,1,2) specifies that states 1 and 3 are represented by proxy-state 1, and states 2 and 4 are represented by proxy-state 2. This is the original case described by Beaulieu and O'meara (2016); in their terminology, there would be 2 "hidden"" states ("0" and "1") and 2 "observed" (proxy) states ("A" and "B"), and the 4 diversification rate categories (Nstates=4) would be called "0A", "1A", "0B" and "1B", respectively. The somewhat different terminology used here allows for easier generalization to an arbitrary number of diversification-modulating states and an arbitrary number of proxy states. For example, if there are 6 diversification modulating states, represented by 3 proxy-states as 1->A, 2->A, 3->B, 4->C, 5->C, 6->C, then one would set Nstates=6, NPstates=3 and proxy_map=c(1,1,2,3,3,3).

The parameter transition_matrix_C can be used to define ClaSSE models (Goldberg and Igic, 2012) or BiSSE-ness models (Magnuson-Ford and Otto, 2012), although care must be taken to properly define the transition probabilities. Here, cladogenetic transitions occur at probabilities that are defined conditionally upon a speciation event, whereas in other software they may be defined as probability rates.

References

W. P. Maddison, P. E. Midford, S. P. Otto (2007). Estimating a binary character's effect on speciation and extinction. Systematic Biology. 56:701-710.

R. G. FitzJohn, W. P. Maddison, S. P. Otto (2009). Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Systematic Biology. 58:595-611

R. G. FitzJohn (2012). Diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution. 3:1084-1092

E. E. Goldberg, B. Igic (2012). Tempo and mode in plant breeding system evolution. Evolution. 66:3701-3709.

K. Magnuson-Ford, S. P. Otto (2012). Linking the investigations of character evolution and species diversification. The American Naturalist. 180:225-245.

J. M. Beaulieu and B. C. O'Meara (2016). Detecting hidden diversification shifts in models of trait-dependent speciation and extinction. Systematic Biology. 65:583-601.

T. Stadler (2011). Simulating trees with a fixed number of extant species. Systematic Biology. 60:676-684.

Examples

Run this code

# NOT RUN {
# Simulate a tree under a BiSSE model (i.e., anagenetic transitions between two states)
A = get_random_mk_transition_matrix(Nstates=2, rate_model="ER", max_rate=0.1)
parameters = list(birth_rates         = c(1,1.5),
                  death_rates         = 0.5,
                  transition_matrix_A = A)
simulation = simulate_dsse( Nstates       = 2, 
                            parameters    = parameters, 
                            max_tips      = 1000, 
                            include_rates = TRUE)
tree       = simulation$tree
Ntips      = length(tree$tip.label)

# plot distribution of per-capita birth rates of tips
rates = simulation$clade_birth_rates[1:Ntips]
barplot(table(rates)/length(rates), 
        xlab="rate", 
        main="Distribution of pc birth rates across tips (BiSSE model)")
# }

Run the code above in your browser using DataLab