ngram

a character string corresponding to the primary structure of the protein.

prot

a positive integer, between 1 and 5, indicating the k-mer of the words to be counted.

Computes the n-gram frequencies vector for a given protein.

Contains utilities for the analysis of protein sequences in a phylogenetic context.
Allows the generation of phylogenetic trees base on protein sequences in an alignment-independent way.
Two different methods have been implemented. One approach is based on the frequency analysis of n-grams,
previously described in Stuart et al. (2002) <doi:10.1093/bioinformatics/18.1.100>. The other approach is based on the species-specific neighborhood preference around amino acids. Features include the conversion of a protein set into a vector
reflecting these neighborhood preferences, pairwise distances (dissimilarity) between these vectors,
and the generation of trees based on these distance matrices.

Juan Aledo

ngram: Compute n-Gram Frequencies Vector

Description

Usage

Arguments

Value

Details

References

See Also

Examples