This function uses dissimilarity matrices Grantham and Schneider to compute the dissimilarity between amino acid pairs.
The distance between amino acid pairs is determined by d which varies between 1 to nlag.
For each d, it computes the sum of the dissimilarities of all amino acid pairs. The sum shows the value of tau for a value d.
The feature vector contains the values of taus for both matrices. Thus, the length of the feature vector is equal to nlag*2.
Usage
SOCNumber(seqs, nlag = 30, label = c())
Arguments
seqs
is a FASTA file with amino acid sequences. Each sequence starts
with a '>' character. Also, seqs could be a string vector. Each element of the vector is a peptide/protein sequence.
nlag
is a numeric value which shows the maximum distance between two amino acids.
Distances can be 1, 2, ..., or nlag. Defult is 30.
label
is an optional parameter. It is a vector whose length is equivalent to the number of sequences. It shows the class of
each entry (i.e., sequence).
Value
It returns a feature matrix. The number of rows is equal to the number of sequences and
the number of columns is (nlag*2). For each distance d, there are two values. One value for Granthman and another one for Schneider distance.