This feature returns the 2 and 3-mer compositions of the protein sequence. This is done by first
finding all possible 2 and 3-mers for any protein (\(20^2\) and \(20^3\) permutations for 2 and 3-mers respectively).
With those permutations, vectors of length 400 and 8000 are created, each point corresponding to one 2 or 3-mer.
Then, the protein sequence that corresponds to the HMM scores is extracted, and put into a bipartite graph with the protein sequence.
Each possible path of length 1 or 2 is found, and the corresponding vertices on the graph are noted as 2 and 3-mers.
For each 2 or 3-mer found from these paths, 1 is added to the position that responds to that 2/3-mer in the
2-mer and 3-mer vectors , which are the length 400 and 8000 vectors created previously. The vectors are then returned.
Usage
hmm_SCSH(hmm)
Value
A vector of length 400.
A vector of length 8000.
Arguments
hmm
The name of a profile hidden markov model file.
References
Mohammadi, A. M., Zahiri, J., Mohammadi, S., Khodarahmi, M., & Arab, S. S. (2022).
PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles.
Biology Methods and Protocols, 7(1).