Learn R Programming

rphast (version 1.0)

phyloFit: Fit a Phylogenetic model to an alignment...

Description

Fit a Phylogenetic model to an alignment

Usage

phyloFit(msa, tree=NULL, subst.mod="REV", init.mod=NULL,
    no.freqs=FALSE, no.rates=FALSE, features=NULL, scale.only=FALSE,
    scale.subtree=NULL, nrates=1, alpha=1, rate.constants=NULL,
    init.random=FALSE, init.parsimony=FALSE, clock=FALSE, EM=FALSE,
    precision="HIGH", ninf.sites=50, quiet=FALSE)

Arguments

msa
An alignment object. May be altered if passed in as a pointer to C memory (see Note).
tree
A character string containing a Newick formatted tree defining the topology. Required if the number of species > 3, unles init.mod is specified. The topology must be rooted, although the root is ignored if the substitution model is reversible.
subst.mod
The substitution model to use. See subst.mods.
init.mod
An object of class tm used to initialize the model
no.freqs
(Only applies when init.mod provided). If TRUE, do not estimate equilibrium frequencies; just use the ones from init.mod.
no.rates
(Only applies when init.mod provided). If TRUE, do not estimate transition rate parameters; just use the transition matrix in init.mod.
features
An object of type feat. If given, a separate model will be estimated for each feature type.
scale.only
A logical value. If TRUE, estimate only the scale of the tree. Branches will be held at initial values. Useful in conjunction with init.mod.
scale.subtree
A character string giving the name of a node in a tree. This option implies scale.only=TRUE. If given, estimate separate scale factors for subtree beneath identified node and the rest of the tree. The branch leading to the subtree is included in the su
nrates
An integer. The number of rate categories to use. Specifying a value greater than one causes the discrete gamma model for rate variation to be used, unless rate constants are specified.
alpha
A numeric value > 0, for use with "nrates". Initial value for alpha, the shape parameter of the gamma distribution.
rate.constants
A numeric vector. Implies nrates = length(rate.constants). Also implies EM=TRUE. Uses a non-parametric mixture model for rates, instead of a gamma distribution. The weight associated with each rate will be estimated. alpha may still be used to initia
init.random
A logical value. If TRUE, parameters will be initialized randomly.
init.parsimony
A logical value. If TRUE, branch lengths will be estimated based on parsimony counts for the alignments. Currently only works for models of order 0.
clock
A logical value. If TRUE, assume a molecular clock in estimation.
EM
A logical value. If TRUE, the model is fit using EM rather than the default BFGS quasi-Newton algorithm. Not available for all models/options.
precision
A character vector, one of "HIGH", "MED", or "LOW", denoting the level of precision to use in estimating model parameters. Affects convergence criteria for iterative algorithms: higher precision means more iterations and longer execution time.
ninf.sites
An integer. Require at least this many "informative" sites in order to estimate a model. An informative site as an alignment column with at least two non-gap and non-missing-data characers.
quiet
A logical value. If TRUE, do not report progress to screen.

Value

  • An object of class tm (tree model), or (if several models are computed, as is possible with the features or windows options), a list of objects of class tm.

Examples

Run this code
exampleArchive <- system.file("extdata", "examples.zip", package="rphast")
files <- c("ENr334.maf", "ENr334.fa", "gencode.ENr334.gff", "rev.mod")
unzip(exampleArchive, files)
m <- read.msa("ENr334.maf")
mod <- phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)")
mod
phyloFit(m, init.mod=mod)
likelihood.msa(m, mod)
mod$likelihood
print(mod$likelihood, digits=10)
f <- read.feat("gencode.ENr334.gff")
mod <- phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)",
                features=f, quiet=TRUE)
names(mod)
mod$other
mod[["5'flank"]]
phyloFit(m, init.mod=mod$AR, nrates=3, alpha=4.0)
phyloFit(m, init.mod=mod$AR, rate.constants=c(10, 5, 1))

Run the code above in your browser using DataLab