Learn R Programming

nodeSub

Branch[]
master
develop
richel

NodeSub is an R package that can be used to generate alignments under the node substitution model.

Installation

NodeSub also provides functions to perform BEAST2 analyses, and if you want access to those, you will need to rely on the functionality of the babette suite. Unfortunately, the babette suite is currently not available on CRAN, but can be installed from rOpenSci:

remotes::install_github("ropensci/beautier")
remotes::install_github("ropensci/tracerer")
remotes::install_github("ropensci/beastier")
remotes::install_github("ropensci/mauricer")
remotes::install_github("ropensci/babette")
remotes::install_github("ropensci/mcbette")

Next, we need to install BEAST2 and the BEAST2 NS package (for marginal likelihood calculation) - and make sure R can find it. This we can do as follows:

remotes::install_github("richelbilderbeek/beastierinstall") 
beastierinstall::install_beast2() 
beastier::is_beast2_installed()
remotes::install_github("richelbilderbeek/mauricerinstall") 
mauricerinstall::install_beast2_pkg("NS")
mauricer::is_beast2_ns_pkg_installed()

Installation of NodeSub

To make use of the BEAST2 functions, we now need some functions that are not availabe on CRAN, because CRAN does not support having github-dependent code in an R package. You can install this functionality through:

remotes::install_github("thijsjanzen/nodeSub", ref = "babette")

Usage

Given a tree, we can generate a NodeSubstitution alignment as follows:

input_tree <- TreeSim::sim.bd.taxa(n = 100,
                                   numbsim = 1,
                                   lambda = 1,
                                   mu = 0.1,
                                   complete = TRUE)[[1]]
                                     
target_alignment <- sim_unlinked(phy = input_tree,
                                 rate1 = 1e-3,
                                 rate2 = 1e-3,
                                 l = 10000,
                                 node_time = 0.3)   

We can then proceed to create a Twin alignment (e.g. an alignment with the exact same number of accumulated substitutions, given the same tree, but using a 'normal' substitution model)

comp_alignment <- create_equal_alignment(input_tree = geiger::drop.extinct(input_tree),  # can only work on trees without extinct branches
                                         sub_rate = 1e-3,
                                         alignment_result = target_alignment)

Now we have two alignments, one generated using the Node Substitution model, and one using standard strict clock model. For both we can perform phylogenetic inference to get the resulting tree. The aim is to compare the posterior distribution of trees with the true tree that we started with 'input_tree', and estimate the error invoked by the node substitution model.

node_posterior <- infer_phylogeny(target_alignment$alignment,
                                  "node_posterior",
                                  clock_prior = beautier::create_strict_clock_model(clock_rate_param = beautier::create_clock_rate_param(value = 1e-3)),
                                  burnin = 0.1,
                                  working_dir = get_wd(),
                                  sub_rate = 1e-3)
                                  
reference_posterior <- infer_phylogeny(comp_alignment$alignment,
                                       "reference_posterior",
                                       burnin = 0.1,
                                       clock_prior = beautier::create_strict_clock_model(clock_rate_param = beautier::create_clock_rate_param(value = comp_alignment$adjusted_rate)),
                                       working_dir = getwd(),
                                       sub_rate = 1e-3)                               

Having two posterior distributions of trees, we can compare them based on summary statistics.

node_stats <- calc_sum_stats(node_posterior$all_trees,
                             input_tree)
ref_stats  <- calc_sum_stats(reference_posterior$all_trees,
                            input_tree)

Copy Link

Version

Install

install.packages('nodeSub')

Monthly Downloads

577

Version

1.2.8

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Thijs Janzen

Last Published

November 14th, 2023

Functions in nodeSub (1.2.8)

sim_unlinked

Simulate a sequence assuming node substitutions are not shared amongst offspring, given two substitution matrices: one for substitutions occuring on the nodes, and one for substitutions occuring along the branches.
sim_unlinked_explicit

Simulate a sequence assuming node substitutions are not shared amongst offspring, using the explicit simulation method (e.g. reverse substitutions are modeled explicitly)
slow_matrix

this function calculates the p matrix within R this is slower than the C++ implementation in get_p_matrix but provides a way to debug and verify
calc_expected_hidden_nodes

Calculate the number of expected hidden nodes in a phylogenetic tree
calc_fraction

Calculate the expected fraction of substitutions at the nodes, relative to the fraction at the branches
create_unbalanced_tree

create an unbalanced tree out of branching times
create_equal_alignment_explicit

function create an alignment with identical information content, using the explicit method to simulate substitutions
calc_required_node_time

Calculate the required node time to obtain a desired fraction of substitutions at the node
create_equal_alignment

function create an alignment with identical information content
count_hidden

Function to calculate the number of hidden speciation events, e.g. speciation events that have lead to an extinct species. Thus, these hidden speciation events can only be detected in complete trees (as opposed to reconstructed trees).
calc_sum_stats

calculate summary statistics of a phylogenetic tree, compared with a reference tree. The following statistics are calculated: the beta statistic, gamma statistic, crown age, mean branch length, number of tips, the nLTT statistic and the laplacian difference, given by RPANDA's JSDtree. Because JSDtree can sometimes cause issues, some additional checks are performed to ensure that is possible to run this function.
estimate_marginal_models

estimate the marginal likelihood of the relaxed and strict clock model for a provided alignment
create_balanced_tree

create a balanced tree out of branching times
sim_normal

Simulate sequences for a given evolutionary tree, using a standard model of sequence evolution along the branches. Code for this function was heavily inspired by the function simSeq from the phangorn package.
sim_linked

simulate a sequence assuming conditional substitutions on the node.
get_p_matrix

calculate p matrix
infer_phylogeny

infer the time calibrated phylogeny associated with the provided alignment. This function uses the R package babette to infer the phylogeny using BEAST2.
nodeSub-package

Package providing functions to simulate sequences under different DNA evolution models
sim_normal_explicit

simulate a sequence assuming substitutions are only accumulated along the branches, using the explicit simulation method (e.g. reverse substitutions are modeled explicitly)
reduce_tree

Function to remove speciation events occuring after an extinction event. Extinct species are pruned randomly, such that only a single extinct species per branching event (if any extinct species) remains.