Learn R Programming

EnvNJ (version 0.1.3)

Whole Genome Phylogenies Using Sequence Environments

Description

Contains utilities for the analysis of protein sequences in a phylogenetic context. Allows the generation of phylogenetic trees base on protein sequences in an alignment-independent way. Two different methods have been implemented. One approach is based on the frequency analysis of n-grams, previously described in Stuart et al. (2002) . The other approach is based on the species-specific neighborhood preference around amino acids. Features include the conversion of a protein set into a vector reflecting these neighborhood preferences, pairwise distances (dissimilarity) between these vectors, and the generation of trees based on these distance matrices.

Copy Link

Version

Install

install.packages('EnvNJ')

Monthly Downloads

208

Version

0.1.3

License

GPL (>= 2)

Maintainer

Juan Aledo

Last Published

September 27th, 2021

Functions in EnvNJ (0.1.3)

msa.tree

Infer a tree based on a MSA
vect2tree

Convert a Set of Vectors into a Tree
msa.merge

Carry Out a MSA of a Set of Different Orthologous Proteins
metrics

Pairwise Vector Dissimilarities
vdis

Compute Pairwise Distances Between Vectors
ngram

Compute n-Gram Frequencies Vector
ncdnj

Compute a Distance Matrix Using Normalized Compression Distance
ngraMatrix

Compute n-Gram Frequencies Dataframe
otu.space

Compute the Matrix Representing the Species Vector Subspace
vtree

Build a Tree When Species Are Encoded by n-Dim Vectors
envfascpp

Convert Fasta Files into Environment Vectors
env.sp

Extract the Sequence Environments
fastaconc

Concatenate Fasta Files in a Single Multispecies Fasta File
bovids

13 orthologous mtDNA-encoded proteins of 11 bovine species.
envnj

Build Trees Based on the Environment Around the Indicated Amino Acid(s)
ncd

Compute Normalized Compression Distances
svdgram

Compute Phylogenetic Trees Using an n-Gram and SVD Approach
vcos

Compute Pairwise Cosines of the Angles Between Vectors
reyes

13 orthologous mtDNA-encoded proteins of 34 mammalian species.
otu.vector

Convert a Set of Sequence Environments into a Vector
d.phy2df

Convert a Phylip Distance Matrix into a DataFrame
cos2dis

Convert Cosines Between Vectors into Pairwise Dissimilarities
aa.comp

Amino Acid Composition
aa.at

Residue Found at the Requested Position
df2fasta

Convert Dataframe into Fasta File
env.extract

Sequence Environment Around a Given Position
env.fasta

Build Trees Based on the Environment Around the Indicated Amino Acid(s)
env.matrices

Environment Matrices
aaf

Compute the Frequency of Each Amino Acid in Each Species