Learn R Programming

PhylogeneticEM (version 1.0.0)

PhyloEM: Model Estimation with Detection of Shifts

Description

PhyloEM is the main function of the package. It uses maximum likelihood methods to fit a BM or an OU process for several traits evolving along a phylogenetic tree, with automatic shift detection on the branches of the tree. This function can handle missing data.

Usage

PhyloEM(phylo, Y_data, process = c("BM", "OU", "scOU", "rBM"), check_postorder = TRUE, independent = FALSE, K_max = max(floor(sqrt(length(phylo$tip.label))), 10), use_previous = FALSE, order = TRUE, method.selection = c("LINselect", "DDSE", "Djump"), C.BM1 = 0.1, C.BM2 = 2.5, C.LINselect = 1.1, method.variance = c("upward_downward", "simple"), method.init = "lasso", method.init.alpha = "estimation", method.init.alpha.estimation = c("regression", "regression.MM", "median"), methods.segmentation = c("lasso", "best_single_move"), alpha_grid = TRUE, nbr_alpha = 10, random.root = TRUE, stationary.root = TRUE, alpha = NULL, check.tips.names = TRUE, progress.bar = TRUE, estimates = NULL, save_step = FALSE, parallel_alpha = FALSE, Ncores = 3, impute_init_Rphylopars = FALSE, K_lag_init = 5, light_result = TRUE, ...)

Arguments

phylo
A phylogenetic tree of class phylo (from package ape).
Y_data
Matrix of data at the tips, size p x ntaxa. Each line is a trait, and each column is a tip. The column names are checked against the tip names of the tree.
process
The model used for the fit. One of "BM" (for a full BM model, univariate or multivariate); "OU" (for an OU with independent traits, univariate or multivariate); or "scOU" (for a "scalar OU" model, see details).
check_postorder
Re-order the tree in post-order. If the Upward-Downward algorithm is used, the tree need to be in post-order. Default to TRUE if the upward-downward is used, otherwise automatically set to FALSE.
independent
Are the trait assumed to be independent from one another? Default to FALSE. OU in a multivariate setting only works if TRUE.
K_max
The maximum number of shifts to be considered. Default to $max(|\sqrt ntaxa|, 10)$.
use_previous
Should the initialization for K+1 shifts use the estimation for $K$ shifts already obtained? Default to FALSE.
order
Should the estimations be done for K increasing (TRUE) or K decreasing (FALSE)? If use_previous=FALSE, this has no influence, exept if one initialization fails. Default to TRUE.
method.selection
Method selection to be used. Several ones can be used at the same time. One of "LINselect" for the Baraud Giraud Huet LINselect method; "DDSE" for the Slope Heuristic or "Djump" for the Jump Heuristic, last two based the Birgé Massart method.
C.BM1
Multiplying constant to be used for the BigeMassart1 method. Need to be positive. Default to 0.1.
C.BM2
Multiplying constant to be used for the BigeMassart2 method. Default to 2.5.
C.LINselect
Multiplying constant to be used for the LINselect method. Need to be greater than 1. Default to 1.1.
method.variance
Algorithm to be used for the moments computations at the E step. One of "simple" for the naive method; of "upward_downward" for the Upward Downward method (usually faster). Default to "upward_downward".
method.init
The initialization method. One of "lasso" for the LASSO base initialization method; or "default" for user-specified initialization values. Default to "lasso".
method.init.alpha
For OU model, initialisation method for the selection strength alpha. One of "estimation" for a cherry-based initialization, using nlrob; or "default" for user-specified initialization values. Default to "estimation".
method.init.alpha.estimation
If method.init.alpha="estimation", choice of the estimation(s) methods to be used. Choices among "regression", (method="M" is passed to nlrob); "regression.MM" (method="MM" is passed to nlrob) or "median" (nlrob is not used, a simple median is taken). Default to all of them.
methods.segmentation
For OU, method(s) used at the M step to find new candidate shifts positions. Choices among "lasso" for a LASSO-based algorithm; and "best_single_move" for a one-move at a time based heuristic. Default to both of them. Using only "lasso" might speed up the function a lot.
alpha_grid
Wether to use a grid for alpha values. Default to TRUE. This is the only available method for scOU. This method is not available for OU with multivariate traits. OU with univariate traits can take both TRUE or FALSE. If TRUE, a grid based on the branch length of the tree is automatically computed, using function find_grid_alpha.
nbr_alpha
If alpha_grid=TRUE, the number of alpha values on the grid. Default to 10.
random.root
Wether the root is assumed to be random (TRUE) of fixed (FALSE). Default to TRUE
stationary.root
Wether the root is assumed to be in the stationnary state. Default to TRUE.
alpha
If the estimation is done with a fixed alpha (either known, or on a grid), the possible value for alpha. Default to NULL.
check.tips.names
Wether to check the tips names of the tree against the column names of the data. Default to TRUE.
progress.bar
Wether to display a progress bar of the computations. Default to TRUE.
estimates
The result of a previous run of this same function. This function can be re-run for other model election method. Default to NULL.
save_step
If alpha_grid=TRUE, wether to save the intermediate results for each value of alpha (in a temporary file). Useful for long computations. Default to FALSE.
parallel_alpha
If alpha_grid=TRUE, wether to run the estimations with different values of alpha on separate cores. Default to FALSE. If TRUE, the log is written as a temporary file.
Ncores
If parallel_alpha=TRUE, number of cores to be used.
impute_init_Rphylopars
Wether to use Rphylopars-package for initialization. Default to FALSE.
K_lag_init
Number of extra shifts to be considered at the initialization step. Increases the accuracy, but can make computations quite slow of taken too high. Default to 5.
light_result
if TRUE (the default), the object returned is enlighted, without easilly computatble quantities. If FALSE, the object can be very heavy, but its subsequent manipulations can be faster (especially for plotting).
...
Further arguments to be passed to estimateEM, including tolerance parameters for stopping criterions, maximal number of iterations, etc.

Value

An object of class PhyloEM. Relevent quantities can be extracted from it using helper functions params_process.PhyloEM, imputed_traits.PhyloEM

Details

Several models can be used:
  • BM with fixed root, univariate or multivariate.
  • OU with fixed or stationary root, univariate or multivariate.

For the OU in the multivariate setting, two assumptions can be made:

  • Independent traits. This amounts to digonal rate and selection matrices.
  • "Scalar OU" (scOU): the rate matrix can be full, but the selection strength matrix is assumed to be scalar, i.e. all the traits are supposed to go to their optimum values with the same speed.

See Also

plot.PhyloEM, params_process.PhyloEM, imputed_traits.PhyloEM

Examples

Run this code
## Not run: 
# ## Load Data
# data(monkeys)
# ## Run method
# # Note: use more alpha values for better results.
# res <- PhyloEM(Y_data = monkeys$dat,        ## data
#                phylo = monkeys$phy,         ## phylogeny
#                process = "scOU",            ## scalar OU
#                random.root = TRUE,          ## root is stationary
#                stationary.root = TRUE,
#                K_max = 10,                  ## maximal number of shifts
#                nbr_alpha = 4,               ## number of alpha values
#                parallel_alpha = TRUE,       ## parallelize on alpha values
#                Ncores = 2)
# ## Plot selected solution (LINselect)
# plot(res) # three shifts
# ## Plot selected solution (DDSE)
# plot(res, method.selection = "DDSE") # no shift
# ## Extract and solution with 5 shifts
# params_5 <- params_process(res, K = 5)
# plot(res, params = params_5)
# ## Show all equivalent solutions
# eq_sol <- equivalent_shifts(monkeys$phy, params_5)
# plot(eq_sol)
# ## End(Not run)

Run the code above in your browser using DataLab