Learn R Programming

paleotree (version 3.3.0)

makePBDBtaxonTree: Creating a Taxon-Tree from Taxonomic Data Downloaded from the Paleobiology Database

Description

The function makePBDBtaxonTree creates phylogeny-like object of class phylo from the taxonomic information recorded in a taxonomy download from the PBDB for a given group. Two different algorithms are provided, the default being based on parent-child taxon relationships, the other based on the nested Linnean hierarchy. The function plotTaxaTreePBDB is also provided as a minor helper function for optimally plotting the labeled topologies that are output by makePBDBtaxonTree.

Usage

makePBDBtaxonTree(taxaDataPBDB, rankTaxon, method = "parentChild",
  tipSet = NULL, cleanTree = TRUE, annotatedDuplicateNames = TRUE,
  APIversion = "1.2")

plotTaxaTreePBDB(taxaTree, edgeLength = 1)

Arguments

taxaDataPBDB

A table of taxonomic data collected from the Paleobiology Database, using the taxa list option with show = class. Should work with versions 1.1-1.2 of the API, with either the pbdb or com vocab. However, as accepted_name is not available in API v1.1, the resulting tree will have a taxon's *original* name and not any formally updated name.

rankTaxon

The selected taxon rank; must be one of 'species', 'genus', 'family', 'order', 'class' or 'phylum'.

method

Controls which algorithm is used for calculating the taxon-tree. The default option is method = "parentChild" which converts the listed binary parent-child taxon relationships in the Paleobiology Database- these parent-child relationships (if missing from the input dataset) are autofilled using API calls to the Paleobiology Database. Alternatively, users may use method = "Linnean", which converts the table of Linnean taxonomic assignments (family, order, etc as provided by show = class in PBDB API calls) into a taxon-tree. Two methods formerly both implemented under method = "parentChild" are also available as method = "parentChildOldMergeRoot" and method = "parentChildOldQueryPBDB" respectively. Both of these use similar algorithms as the current method = "parentChild" but differ in how they treat taxa with parents missing from the input taxonomic dataset. method = "parentChildOldQueryPBDB" behaves most similar to method = "parentChild" in that it queries the Paleobiology Database via the API , but repeatedly does so for information on parent taxa of the 'floating' parents, and continues within a while loop until only one such unassigned parent taxon remains. This latter option may talk a long time or never finish, depending on the linearity and taxonomic structures encountered in the PBDB taxonomic data; i.e. if someone a taxon was ultimately its own indirect child in some grand loop by mistake, then under this option makePBDBtaxonTree might never finish. In cases where taxonomy is bad due to weird and erroneous taxonomic assignments reported by the PBDB, this routine may search all the way back to a very ancient and deep taxon, such as the Eukaryota taxon. method = "parentChildOldMergeRoot" will combine these disparate potential roots and link them to an artificially-constructed pseudo-root, which at least allows for visualization of the taxonomic structure in a limited dataset. This latter option will be fully offline, as it does nto do any additional API calls of the Paleobiology Database, unlike other options.

tipSet

This argument only impacts analyses where method = "parentChild" is used. This tipSet argument controls which taxa are selected as tip taxa for the output tree. tipSet = "nonParents" selects all child taxa which are not listed as parents in parentChild. Alternatively, tipSet = "all" will add a tip to every internal node with the parent-taxon name encapsulated in parentheses. The default is NULL - if NULL and method = "parentChild", then tipSet is set to = "nonParents".

cleanTree

When TRUE (the default), the tree is run through a series of post-processing, including having singles collapsed, nodes reordered and being written out as a Newick string and read back in, to ensure functionality with ape functions and ape-derived functions. If FALSE, none of this post-processing is done and users should beware, as such trees can lead to hard-crashes of R.

annotatedDuplicateNames

A logical determining whether duplicate taxon names, when found in the Paleobiology Database for taxa (presumably reflecting an issue with taxa being obsolete but with incomplete seniority data), should be annotated to include sequential numbers so to modify them, via functionbase's make.unique. This only applies to method = "parentChild", with the default option being annotatedDuplicateNames = TRUE. If more than 26 duplicates are found, an error is issued. If this argument is FALSE, an error is issued if duplicate taxon names are found.

APIversion

Version of the Paleobiology Database API used by makePBDBtaxonTree when method = "parentChild" or method = "parentChildOldQueryPBDB" is used. The current default is APIversion = "1.2", the most recent API version as of 12/11/2018.

taxaTree

A phylogeny of class phylo, presumably a taxon tree as output from makePBDBtaxonTree with higher-taxon names as node labels.

edgeLength

The edge length that the plotted tree should be plotted with (plotTaxaTreePBDB plots phylogenies as non-ultrametric, not as a cladogram with aligned tips).

Value

A phylogeny of class phylo, where each tip is a taxon of the given rankTaxon. See additional details regarding branch lengths can be found in the sub-algorithms used to create the taxon-tree by this function: parentChild2taxonTree and taxonTable2taxonTree.

Depending on the method used, either the element $parentChild or $taxonTable is added to the list structure of the output phylogeny object, which was used as input for one of the two algorithms mentioned above.

Please note that when applied to output from the taxa option of the API version 1.1, the taxon names returned are the original taxon names as 'accepted_name' is not available in API v1.1, while under API v1.2, the returned taxon names should be the most up-to-date formal names for those taxa. Similar issues also effect the identification of parent taxa, as the accepted name of the parent ID number is only provided in version 1.2 of the API.

Details

This function should not be taken too seriously. Many groups in the Paleobiology Database have out-of-date or very incomplete taxonomic information. This function is meant to help visualize what information is present, and by use of time-scaling functions, allow us to visualize the intersection of temporal and phylogenetic, mainly to look for incongruence due to either incorrect taxonomic placements, erroneous occurrence data or both.

Note however that, contrary to common opinion among some paleontologists, taxon-trees may be just as useful for macroevolutionary studies as reconstructed phylogenies (Soul and Friedman, 2015.).

References

Peters, S. E., and M. McClennen. 2015. The Paleobiology Database application programming interface. Paleobiology 42(1):1-7.

Soul, L. C., and M. Friedman. 2015. Taxonomy and Phylogeny Can Yield Comparable Results in Comparative Palaeontological Analyses. Systematic Biology (Link)

See Also

Two other functions in paleotree are used as sub-algorithms by makePBDBtaxonTree to create the taxon-tree within this function, and users should consult their manual pages for additional details:

parentChild2taxonTree and taxonTable2taxonTree

Closely related functions for

Other functions for manipulating PBDB data can be found at taxonSortPBDBocc, occData2timeList, and the example data at graptPBDB.

Examples

Run this code
# NOT RUN {
#get some example occurrence and taxonomic data
data(graptPBDB)

#get the taxon tree: Linnean method
graptTreeLinnean <- makePBDBtaxonTree(
    taxaDataPBDB = graptTaxaPBDB,
    rankTaxon = "genus",
    method = "Linnean")

#get the taxon tree: parentChild method
graptTreeParentChild <- makePBDBtaxonTree(
    taxaDataPBDB = graptTaxaPBDB,
    rankTaxon = "genus",
    method = "parentChild")

# let's plot these and compare them! 
plotTaxaTreePBDB(graptTreeParentChild)

plotTaxaTreePBDB(graptTreeLinnean)


####################################################
# let's try some other groups

#conodonts
conoData <- getCladeTaxaPBDB("Conodonta")
conoTree <- makePBDBtaxonTree(
    taxaDataPBDB = conoData,
    rankTaxon = "genus",
    method = "parentChild")
# plot it!
plotTaxaTreePBDB(conoTree)

#asaphid trilobites
asaData <- getCladeTaxaPBDB("Asaphida")
asaTree <- makePBDBtaxonTree(
    taxaDataPBDB = asaData,
    rankTaxon = "genus",
    method = "parentChild")
# plot it!
plotTaxaTreePBDB(asaTree)

#Ornithischia
ornithData <- getCladeTaxaPBDB("Ornithischia")
ornithTree <- makePBDBtaxonTree(
    taxaDataPBDB = ornithData,
    rankTaxon = "genus",
    method = "parentChild")
plotTaxaTreePBDB(ornithTree)

#try Linnean!

#but first... need to drop repeated taxon first: Hylaeosaurus
findHylaeo <- ornithData$taxon_name == "Hylaeosaurus"
# there's actually only one accepted ID number
HylaeoIDnum <- unique(ornithData[findHylaeo,"taxon_no"])
HylaeoIDnum 

# so, take which one has occurrences listed
dropThis <- which((ornithData$n_occs < 1) & findHylaeo)
ornithDataCleaned <- ornithData[-dropThis,]

ornithTree <- makePBDBtaxonTree(
    ornithDataCleaned,
    rankTaxon = "genus",
    method = "Linnean")
plotTaxaTreePBDB(ornithTree)


########################
#Rhynchonellida
rynchData <- getCladeTaxaPBDB("Rhynchonellida")
rynchTree <- makePBDBtaxonTree(
    taxaDataPBDB = rynchData,
    rankTaxon = "genus",
    method = "parentChild")
plotTaxaTreePBDB(rynchTree)

#some of these look pretty messy!

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab