SOCNumber: Sequence Order Coupling Number (SOCNumber)
Description
This function uses dissimilarity matrices Grantham and Schneider to compute the dissimilarity between amino acid pairs.
The distance between amino acid pairs is determined by d which varies between 1 to nlag.
For each d, it computes the sum of the dissimilarities of all amino acid pairs. The sum shows the value of tau for a value d.
The feature vector contains the values of taus for both matrices. Thus, the length of the feature vector is equal to nlag*2.
Usage
SOCNumber(seqs, nlag = 30, label = c())
Arguments
seqs
is a FASTA file with amino acid sequences. Each sequence starts
with a '>' character. Also, seqs could be a string vector. Each element of the vector is a peptide/protein sequence.
nlag
is a numeric value which shows the maximum distance between two amino acids.
Distances can be 1, 2, ..., or nlag. Defult is 30.
label
is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of
each entry (i.e., sequence).
Value
It returns a feature matrix. The number of rows is equal to the number of sequences and
the number of columns is (nlag*2). For each distance d, there are two values. One value for Granthman and another one for Schneider distance.