Learn R Programming

paleotree (version 3.1.3)

obtainDatedPosteriorTreesMrB: Get the Sample of Posterior Trees from a Dated Phylogenetic Analysis with MrBayes (Or a Summary Tree, such as the MCCT)

Description

MrBayes is not great for getting samples of dated posterior phylogenies, or for obtaining certain summary trees from the posterior (specifically the MCCT and MAP, which are specific trees in the posterior). This is because the tree samples as returned are scaled relative to rate parameters in a separate file. This function attempts to automate the handling of multiple files (both .t tree files and .p parameter files), as well as multiple files associated with separate runs, to obtain samples of posterior trees, or summary trees such as the MCCT or MAP. These resulting trees are now scaled to units of time, but not be placed correctly on an absolute time-scale if all tips are extinct. See details of output below.

Usage

obtainDatedPosteriorTreesMrB(runFile, nRuns = 2, burnin = 0.5, outputTrees,
  labelPostProb = FALSE, getFixedTimes = FALSE, getRootAges = FALSE,
  originalNexusFile = NULL, file = NULL)

Arguments

runFile

A filename in the current directory, or a path to a file that is either a .p or .t file from a MrBayes analysis. This filename and path will be used for finding additional .t and .p files, via the nRuns settings and assuming that files are in the same directory and these files are named under typical MrBayes file naming conventions. (In other words, if you have renamed your .p or .t files, this function probably won't be able to find them.)

nRuns

The number of runs in your analysis. This variable is used for figuring out what filenames will be searched for: if you specify that you have less runs than you actually ran in reality, then some runs won't be examined in thi function. Conversely, specify too many, and this function will throw an error when it cannot find files it expects but do not exist. The default for this argument (two runs) is based on the default number of runs in MrBayes.

burnin

The fraction of trees sampled in the posterior discarded and not returned by this function directly, nor included in calculation of summary trees. Must be a numeric value greater than 0 and less than 1.

outputTrees

Determines the output trees produced; for format of output, see section on returned Value below. Must be of length one, and either "all", which means all trees from the post-burnin posterior will returned, a number greater than zero, which will be the number of trees randomly sampled from across the post-burning posterior and returned, or "MCCT" and "MAP", which stand for 'maximum clade compatibility tree' and 'maximum a posteri tree' respectively. The MAP is the single tree from the post-burnin posterior with the highest marginal likelihood. The MCCT is the single tree from the post-burnin posterior which contains clades with the highest product of posterior probabilities for its component clades. Thus, the MAP is the best overall tree, while the MCCT may be the best tree for summarizing topological support.

labelPostProb

Logical. If TRUE, then nodes of the output tree will be labeled with their respective posterior probabilities, as calculated based on the frequency of a clade occurring across the post-burnin posterior tree sample. If FALSE, this is skipped.

getFixedTimes

If TRUE, this function will also look for, scan, and parse an associated NEXUS file. Ignoring any commented lines (ie. anything between "[ ]" ), commands for fixing taxa will be identified, parsed and returned to the user, either as a message printed to the R console if output is read to a file, or as a attribute named 'fixed ages' if output as an R object (formatted as a two-column table of OTU names and their respective fixed ages). If the output is an R object, these objects with

Please note: the code for getFixedTimes = TRUE contains a while() loop in it for removing nested series of square brackets (i.e. treated as comments in NEXUS files). Thus files with ridiculously nested series of brackets may cause this code to take a while to complete, or may even cause it to hang.

getRootAges

FALSE by default. If TRUE, and getFixedTimes = TRUE as well as file = NULL (such that trees will be assigned within the R memory rather than saved to an external file), the functions setRootAge and its wrapper function setRootAges will be applied to the output so that all output trees have root.time elements for use with other functions in paleotree as well as other packages.

originalNexusFile

Filename (and possibly path too) to the original NEXUS file for this analysis. Only tried if getFixedTimes = TRUE. If NULL (the default), then this function will instead try to find a NEXUS file with the same name as implied by the filename used in other inputs. If this file cannot be found, the function will fail.

file

Filename (possibly with path) as a character string leading to a file which will be overwritten with the output trees (or summary tree), as a NEXUS file. If NULL (the default), the output will instead be directly returned by this function.

Value

Depending on argument file, the output tree or trees is either returned directly, or instead written out in NEXUS format via ape's write.NEXUS function to an external file. The output will consist either of multiple trees sampled from the post-burnin posterior, or will consist of a single phylogeny (a summary tree, either the MCCT or the MAP).

If the argument setRootAges = TRUE is not used, users are warned that the resulting dated trees will not have $root.time elements necessary for comparison against an absolute time-scale. Wile the trees may be scaled to units of absolute time now, rather than with branch lengths expressed in the rate of character change, the dates estimated by some phylogenetics functions in R may give inaccurate estimates of when events occur on the absolute time-scale if all tips are extinct. This is because most functions for phylogenetics in R (and elsewhere) will instead presume that the latest tip will be at time 0 (the modern), which may be wrong if you are using paleotree for analyzing paleontological datasets consisting of entirely extinct taxa. This can be solved by using argument getFixedTimes = TRUE to obtain fixed tip ages, and then scaling the resulting output to absolute time using the argument setRootAges = TRUE, which obtains a $root.time element for each tree using the functions setRootAge and setRootAges (for single and multiple phylogenies).

Details

This function is most useful for dealing with dating analyses in MrBayes, particularly when tip-dating a tree with fossil taxa, as the half-compatibility and all-compatibility summary trees offered by the 'sumt' command in MrBayes can have issues properly portraying summary trees from such datasets.

See Also

When the arguments getFixedTimes = TRUE and setRootAges = TRUE are used, the resulting output will be scaled to absolute time with the available fixed ages using functions setRootAge and setRootAges (for single and multiple phylogenies). This is only done if fixed ages are available and if the tree is not being saved to an external file.

Maximum Clade Credibility trees are estimated using the functionmaxCladeCred in package phangorn.

Examples

Run this code
# NOT RUN {
MCCT <- obtainDatedPosteriorTreesMrB(
 	runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
 	nRuns = 2, burnin = 0.5,
		outputTrees = "MCCT", file = NULL)

MAP <- obtainDatedPosteriorTreesMrB(
 	runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
 	nRuns = 2, burnin = 0.5, getFixedTimes = TRUE,
		outputTrees = "MAP", file = NULL)

# get a root age from the fixed ages for tips
setRootAge(tree = MAP)

#pull a hundred trees randomly from the posterior
hundredRandomlySelectedTrees <- obtainDatedPosteriorTreesMrB(
 	runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
 	nRuns = 2, burnin = 0.5, getFixedTimes = TRUE,
 	getRootAges = TRUE,
		outputTrees = 100, file = NULL)


# }
# NOT RUN {
# }

Run the code above in your browser using DataLab