SelonHMMOptimize: Optimize parameters under the HMM SELON model

Description

Optimizes model parameters under the HMM SELON model

Usage

SelonHMMOptimize(nuc.data.path, n.partitions = NULL, phy,
  edge.length = "optimize", edge.linked = TRUE, nuc.model = "GTR",
  global.nucleotide.model = TRUE, diploid = TRUE, verbose = FALSE,
  n.cores = 1, max.tol = .Machine$double.eps^0.5, max.evals = 1e+06,
  cycle.stage = 12, max.restarts = 10, output.by.restart = TRUE,
  output.restart.filename = "restartResult", fasta.rows.to.keep = NULL)

Arguments

nuc.data.path

Provides the path to the directory containing the gene specific fasta files that contains the nucleotide data.

n.partitions

The number of partitions to analyze. The order is based on the Unix order of the fasta files in the directory.

phy

The phylogenetic tree to optimize the model parameters.

edge.length

Indicates whether or not edge lengths should be optimized. By default it is set to "optimize", other option is "fixed", which user-supplied branch lengths.

edge.linked

A logical indicating whether or not edge lengths should be optimized separately for each gene. By default, a single set of each lengths is optimized for all genes.

nuc.model

Indicates what type nucleotide model to use. There are three options: "JC", "GTR", or "UNREST".

global.nucleotide.model

assumes nucleotide model is shared among all partitions

diploid

A logical indicating whether or not the organism is diploid or not.

verbose

Logical indicating whether each iteration be printed to the screen.

n.cores

The number of cores to run the analyses over.

max.tol

Supplies the relative optimization tolerance.

max.evals

Supplies the max number of iterations tried during optimization.

cycle.stage

Specifies the number of cycles per restart. Default is 12.

max.restarts

Supplies the number of random restarts.

output.by.restart

Logical indicating whether or not each restart is saved to a file. Default is TRUE.

output.restart.filename

Designates the file name for each random restart.

fasta.rows.to.keep

Indicates which rows to remove in the input fasta files.

Details

SELON stands for SELection On Nucleotides. This function takes a user supplied topology and a set of fasta formatted sequences and optimizes the parameters in the SELON model. Selection is based on selection towards an optimal nucleotide at each site, which is based simply on the majority rule of the observed data. The strength of selection is then varied along sites based on a Taylor series, which scales the substitution rates. Still a work in development, but so far, seems very promising.