Learn R Programming

DAISIEprep

Package description

DAISIEprep is an R package that enables the extraction and formatting of phylogenetic data on island species for the inference model DAISIE (Dynamic Assembly of Island biota through Speciation, Immigration and Extinction). The central function, DAISIEprep::extract_island_species(), uses data from phylogenetic trees and species island endemicity statuses (i.e. endemic to the island, non-endemic, or not present on the island). The phylogenetic and endemicity data are handled together using the phylo4d S4 class from the phylobase R package.

DAISIEprep fills the niche of standardised, reproducible data processing for the suite of DAISIE inference models. Unlike other phylogenetic methods implemented in R, DAISIE has yet to have a defined methodological framework to extract and format data prior to analysis. While other phylogenetic models in R commonly use the phylo S3 data structure, defined by the R package ape, DAISIE has an idiosynchratic data structure that will be unfamiliar to new users. This package provides a set of tools for those users to facilitate the application of DAISIE's models for research. The package also opens the possibility of extracting island data from ‘big data’ macrophylogenies (>5,000 species) which would have impeded researchers who would have previously had to extract this data manually.

There are two algorithms to extract the data the min algorithm or asr (ancestral state reconstruction) algorithm. The former is based on the rules/assumptions of the DAISIE inference model of colonisation of species from mainland source pool, speciation on the island through cladogenesis or anagenensis, and island extinction. Therefore, this algorithm assumes no back-colonisation from the island to the mainland or mainland evolutionary processes. If the data seems to conform to these assumptions (by visual inspection) then this is a good method to choose (DAISIEprep::extract_island_species(..., extraction_method = “min”). Alternatively, the data may violate these assumptions, by, for example having species within an island radiation migrate back to the mainland. In these, and other cases the asr algorithm provides a method to extract data based on the most probable reconstruction of the species ranges (i.e. island presence/absence) and then can extract clades that may have non-island species (DAISIEprep::extract_island_species(..., extraction_method = “asr”). The asr algorithm utilises ancestral state reconstruction methods from other packages (e.g. castor), but the package is flexible to users extending this to incorporate other models which may better suit their data set.

Installation

Install DAISIEprep from CRAN:

install.packages("DAISIEprep")

The development version of DAISIEprep can be installed from GitHub:

if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
remotes::install_github("joshwlambert/DAISIEprep")

Tutorial

See tutorial.

See frequently asked questions (FAQs) about DAISIE.

Help

To report a bug please open an issue or email at joshua.lambert@lshtm.ac.uk.

Contribute

The DAISIE team always welcomes contributions to any of its packages. If you would like to contribute to this package please follow the contributing guidelines

Code of Conduct

Please note that the DAISIEprep project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('DAISIEprep')

Monthly Downloads

214

Version

1.0.1

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Joshua W. Lambert

Last Published

October 30th, 2025

Functions in DAISIEprep (1.0.1)

add_outgroup

Add an outgroup species to a given phylogeny.
as_daisie_datatable

Converts the Island_tbl class to a data frame in the format of a DAISIE data table (see DAISIE R package for details). This can then be input into DAISIEprep::create_daisie_data() function which creates the list input into the DAISIE ML models.
benchmark

Performance analysis of the extract_island_species() function Uses system.time() for timing for reasons explained here: https://radfordneal.wordpress.com/2014/02/02/inaccurate-results-from-microbenchmark/ # nolint
bind_colonist_to_tbl

Takes an existing instance of an Island_tbl class and bind the information from the instance of an Island_colonist class to it
add_multi_missing_species

Calculates the number of missing species to be assigned to each island clade in the island_tbl object and assigns the missing species to them. In the case that multiple genera are in an island clade and each have missing species the number of missing species is summed. Currently the missing species are assigned to the genus that first matches with the missing species table, however a more biologically or stochastic assignment is in development.
any_outgroup

Checks whether the phylogeny has an outgroup that is not present on the island. This is critical when extracting data from the phylogeny so the stem age (colonisation time) is correct.
check_island_colonist

Checks the validity of the Island_colonist class
count_missing_species

Reads in the checklist of all species on an island, including those that are not in the phylogeny (represented by NA) and counts the number of species missing from the phylogeny each genus
all_endemicity_status

All possible endemicity statuses
check_phylo_data

Checks whether \linkS4class{phylo4d} object conforms to the requirements of the DAISIEprep package. If the function does not return anything the data is ready to be used, if an error is returned the data requires some pre-processing before DAISIEprep can be used
check_multi_island_tbl

Checks the validity of the Multi_island_tbl class
check_island_tbl

Checks the validity of the Island_tbl class
create_daisie_data

This is a wrapper function for DAISIE::DAISIE_dataprep(). It allows the final DAISIE data structure to be produced from within DAISIEprep. For detailed documentation see the help documentation in the DAISIE package (?DAISIE::DAISIE_dataprep).
extract_nonendemic

Extracts the information for a non-endemic species from a phylogeny (specifically phylo4d object from phylobase package) and stores it in in an island_colonist class
extract_multi_tip_species

Extracts the information for a species (endemic or non-endemic) which has multiple tips in the phylogeny (i.e. more than one sample per species) from a phylogeny (specifically phylo4d object from phylobase package) and stores it in an Island_colonist class
create_endemicity_status

Creates a data frame with the endemicity status (either 'endemic', 'nonendemic', 'not_present') of every species in the phylogeny using a phylogeny and a data frame of the island species and their endemicity (either 'endemic' or 'nonendemic') provided.
extract_endemic_singleton

Extracts the information for an endemic species from a phylogeny (specifically phylo4d object from phylobase package) and stores it in in an Island_colonist class
extract_clade_name

Creates a name for a clade depending on whether all the species of the clade have the same genus name or whether the clade is composed of multiple genera, in which case it will create a unique clade name by concatinating the genus names
extract_endemic_clade

Extracts the information for an endemic clade (i.e. more than one species on the island more closely related to each other than other mainland species) from a phylogeny (specifically phylo4d object from phylobase package) and stores it in an Island_colonist class
create_test_phylod

Creates phylod objects.
default_params_doc

Documentation for function in the DAISIEprep package
coccyzus_phylod

A phylogenetic tree of coccyzus species with endemicity status as tip states.
extract_stem_age_genus

Extracts the stem age from the phylogeny when the a species is known to belong to a genus but is not itself in the phylogeny and there are members of the same genus are in the phylogeny
extract_stem_age_asr

Extracts the stem age from the phylogeny when the a species is known to belong to a genus but is not itself in the phylogeny and there are members of the same genus are in the phylogeny using the 'asr' extraction method
extract_island_species

Extracts the colonisation, diversification, and endemicty data from phylogenetic and endemicity data and stores it in an Island_tbl object
extract_asr_clade

Extracts an island clade based on the ancestral state reconstruction of the species presence on the island, therefore this clade can contain non-endemic species as well as endemic species.
extract_biogeobears_ancestral_states_probs

Extract ancestral state probabilities from BioGeoBEARS output
island_colonist

Constructor for Island_colonist
island_tbl

Constructor function for Island_tbl class
extract_stem_age

Extracts the stem age from the phylogeny when the a species is known to belong to a genus but is not itself in the phylogeny and there are members of the same genus are in the phylogeny. The stem age can either be for the genus (or several genera) in the tree (stem = "genus") or use an extraction algorithm to find the stem of when the species colonised the island (stem = "island_presence), either 'min' or 'asr' as in extract_island_species(). When stem = "island_presence" the reconstructed node states are used to determine the stem age.
extract_species_min

Extracts the colonisation, diversification, and endemicty data from phylogenetic and endemicity data and stores it in an Island_tbl object using the "min" algorithm that extract island species as the shortest time to the present.
columbiformes_phylod

A phylogenetic tree of columbiformes species with endemicity status as tip states.
is_multi_tip_species

Checks if a species is represented in the tree has multiple tips and those tips form a monophyletic group (i.e. one species with multiple samples) all labeled as with the same endemicity status
is_identical_island_tbl

Checks whether two Island_tbl objects are identical. If they are different comparisons are made to report which components of the Island_tbls are different.
extract_species_asr

Extracts the colonisation, diversification, and endemicty data from phylogenetic and endemicity data and stores it in an Island_tbl object using the "asr" algorithm that extract island species given their ancestral states of either island presence or absence.
myiarchus_phylod

A phylogenetic tree of myiarchus species with endemicity status as tip states.
multi_island_tbl

Constructor function for Multi_island_tbl class
unique_island_genera

Determines the unique endemic genera that are included in the island clades contained within the island_tbl object and stores them as a list with each genus only occuring once in the first island clade it appears in
read_performance

Reads in performance analysis results from inst/performance_data and formats the data ready for plotting
pyrocephalus_phylod

A phylogenetic tree of pyrocephalus species with endemicity status as tip states.
extract_nonendemic_forced

Extract non-endemic colonist that is forced to be a singleton by user
is_back_colonisation

Checks whether species has undergone back-colonisation from
endemicity_to_sse_states

Convert endemicity to SSE states
is_duplicate_colonist

Determines if colonist has already been stored in Island_tbl class. This is used to stop endemic clades from being stored multiple times in the island table by checking if the endemicity status and branching times are identical.
get_sse_tip_states

Extract tip states from a phylod object
translate_status

Takes a string of the various ways the island species status can be and returns a uniform all lower-case string of the same status to make handling statuses easier in other function
sse_states_to_endemicity

Convert SSE states back to endemicity status
read_sensitivity

Reads in the results from the sensitivity analysis saved in the inst/sensitivity_data folder
get_endemic_species

Checks whether the focal species (given by its tip lable in species_label argument) is part of an endemic clade on the island and a vector of the endemic species, either a single species for a singleton or multiple species in an endemic clade.
write_biogeobears_input

Write input files for BioGeoBEARS
rm_island_colonist

Removes an island colonist from an Island_tbl object
rm_multi_missing_species

Loops through the genera that have missing species and removes the ones that are found in the missing genus list which have phylogenetic data. This is useful when wanting to know which missing species have not been assigned to the island_tbl using add_multi_missing_species().
progne_phylod

A phylogenetic tree of progne species with endemicity status as tip states.
plot_colonisation

Plots a dot plot (cleveland dot plot when include_crown_age = TRUE) of the stem and potentially crown ages of a community of island colonists.
plot_sensitivity

Plots
plant_phylo

A phylogenetic tree of plant species.
sensitivity

Runs a sensitivity analysis to test the influences of changing the data on the parameter estimates for the DAISIE maximum likelihood inference model
setophaga_phylod

A phylogenetic tree of setophaga species with endemicity status as tip states.
extract_stem_age_min

Extracts the stem age from the phylogeny when the a species is known to belong to a genus but is not itself in the phylogeny and there are members of the same genus are in the phylogeny using the 'min' extraction method
multi_extract_island_species

Extracts the colonisation, diversification, and endemicty data from multiple phylod (phylo4d class from phylobase) objects (composed of phylogenetic and endemicity data) and stores each in an Island_tbl object which are stored in a Multi_island_tbl object.
finches_phylod

A phylogenetic tree of finches species with endemicity status as tip states.
mimus_phylod

A phylogenetic tree of mimus species with endemicity status as tip states.
select_endemicity_status

Select endemicity status from ancestral states probabilities
plot_phylod

Plots the phylogenetic tree and its associated tip and/or node data
plot_performance

Plots performance results for a grouping variable (prob_on_island or prob_endemic).
round_up

Rounds numbers using the round up method, rather than the round to the nearest even number method used by the base function round.
rm_duplicate_island_species

Remove any duplicated species from the island_tbl after "asr" extraction
write_phylip_biogeo_file

Write biogeography input file for BioGeoBEARS
write_newick_file

Write tree input file for BioGeoBEARS
DAISIEprep-package

DAISIEprep: Extracts Phylogenetic Island Community Data from Phylogenetic Trees
add_island_colonist

Adds an island colonists (can be either a singleton lineage or an island clade) to the island community (island_tbl).
get_clade_name

Accessor functions for the data (slots) in objects of the Island_colonist class
Island_tbl-class

Defines the island_tbl class which is used when extracting information from the phylogenetic and island data to be used for constructing a daisie_data_tbl
Multi_island_tbl-class

Defines the Multi_island_tbl class which is multiple Island_tbls.
get_island_tbl

Accessor functions for the data (slots) in objects of the Island_tbl class
Island_colonist-class

Defines the island_tbl class which is used when extracting information from the phylogenetic and island data to be used for constructing a daisie_data_tbl
add_missing_species

Adds a specified number of missing species to an existing island_tbl at the colonist specified by the species_to_add_to argument given. The species given is located within the island_tbl data and missing species are assigned. This is to be used after extract_island_species() to input missing species.
add_asr_node_states

Fits a model of ancestral state reconstruction of island presence
any_polyphyly

Checks whether there are any species in the phylogeny that have multiple tips (i.e. multiple subspecies per species) and whether any of those tips are paraphyletic (i.e. are their subspecies more distantly related to each other than to other subspecies or species).
all_descendants_conspecific

Checks whether all species given in the descendants vector are the same species.
any_back_colonisation

Detects any cases where a non-endemic species or species not present on the island has likely been on the island given its ancestral state reconstruction indicating ancestral presence on the island and so is likely a back colonisation from the island to the mainland (or potentially different island). This function is useful if using extraction_method = "min" in DAISIEprep::extract_island_species() as it may brake up a single colonist into multiple colonists because of back-colonisation.
GalapagosTrees

Phylogenetic trees of the Galapagos bird lineages and sister species on the mainland.