PSSMCOOL
This package includs all features that has been extracted from Position-Specific Scoring Matrix (PSSM) in POSSUM website: http://possum.erc.monash.edu/ and some of other features from several articles. each function in this package corresponds to one feature that has extracted from PSSM Matrix.
Install from GitHub : devtools::install_github("BioCool-Lab/PSSMCOOL")
1 Preface
Feature extraction or feature encoding is a fundamental step in the construction of high-quality machine learning-based models. Specifically, this is a key step for determining the effectiveness of the trained models in bioinformatics applications. In the last two decades, a variety of feature encoding schemes have been proposed in order to exploit useful patterns from protein sequences. Such schemes are often based on sequence information or physicochemical properties of amino acids. Although direct features derived from sequences themselves (such as amino acid compositions, dipeptide compositions, and counting of k-mers) are regarded as essential for training models, an increasing number of studies have shown that evolutionary information in the form of PSSM profiles is much more informative than sequence information alone. Accordingly, PSSM-based feature descriptors have been commonly used as indispensable primary features to construct models, filling a major gap in the current bioinformatics research. For example, PSSM-based feature descriptors have successfully improved the prediction performance of structural and functional properties of proteins across a wide spectrum of bioinformatics applications. These predictions can be applied for protein fold recognition and the prediction of protein structural classes, protein-protein interactions, protein subcellular localization, RNA-binding sites, and protein functions. But at the same time, there is no comprehensive, simple tool in the R programming language for extracting all of these features from the PSSM and displaying it in the output. PSSMCOOL package here is developed in R for these purposes. First, in Figure 1 which is a table that lists all of the features implemented in this package with their feature-lengths is brought. Then each one of these features will be explained in full detail.
PSSMCOOL Package is currently available on CRAN website:
https://CRAN.R-project.org/package=PSSMCOOL
for issues about this package :
https://github.com/BioCool-Lab/PSSMCOOL/issues
*feature vector length depends on the choice of parameter **these features produce Matrix of features which its dimension depends on choice of parameter
library(PSSMCOOL)
2 PSSM-AC
This feature, which stands for auto-covariance transformation, for column j, calculates the average of this column as shown in Figure 2, Then subtracts the resulting number from the elements on the rows i and i + g of this column, and finally multiplies them and calculates the sum by changing the variable i from 1 to L-g. Because the variable j changes between 1 and 20 and the variable g changes between 1 and 10, eventually a feature vector of length 200 will be obtained.
Usage of this feature in PSSMCOOL package:
w<-PSSMAC(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(w, n = 50)
## [1] 0.0092 0.0070 0.0181 0.0072 -0.0032 0.0054 -0.0068 -0.0065 -0.0015
## [10] -0.0032 0.0057 0.0102 0.0059 0.0087 0.0176 0.0072 0.0141 -0.0080
## [19] -0.0132 0.0109 0.0271 0.0064 0.0080 0.0107 0.0130 0.0024 0.0062
## [28] 0.0126 0.0108 0.0061 0.0159 -0.0024 0.0157 0.0102 -0.0052 0.0141
## [37] -0.0023 -0.0097 0.0244 0.0333 0.0094 0.0132 -0.0003 -0.0006 0.0158
## [46] 0.0068 0.0094 -0.0069 0.0085 -0.0109
3 DPC-PSSM
This feature stands for dipeptide composition, which multiplies the values that are located in two consecutive rows and two different columns. Having calculated these values for different rows and columns, they are summed. Next, for both columns, the sum is divided by L-1. Since the result depends on two different columns, eventually a feature vector of length 400, according to Figure 3 and following equation ,will be obtained.
In the above equation, 's, are the PSSM elements and 's are the 20x20 matrix elements, which by placing the rows of this matrix next to each other, DPC-PSSM feature vector of length 400 is obtained.
Usage of this feature in PSSMCOOL package:
ss<-DPC_PSSM(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(ss, n = 50)
## [1] 0.3773 0.2965 0.3371 0.3049 0.1816 0.4356 0.3395 0.2866 0.3109 0.2576
## [11] 0.2622 0.3150 0.2776 0.3187 0.2383 0.3846 0.2755 0.1791 0.2180 0.2808
## [21] 0.1528 0.1218 0.1387 0.1063 0.0794 0.1766 0.1235 0.1070 0.1410 0.1055
## [31] 0.1085 0.1235 0.1078 0.1354 0.0858 0.1631 0.1174 0.0823 0.0939 0.1186
## [41] 0.1241 0.0928 0.1133 0.0890 0.0416 0.1248 0.0982 0.0916 0.0916 0.0746
4 Trigram-PSSM
This feature vector is of length 8000, which is extracted from the PSSM. If we multiply elements available in three consecutive rows and three different columns of the PSSM by each other, and apply this to all rows (all three consecutive rows) and then sum these numbers, eventually one of the elements of feature vector with length 8000 corresponding to the three selected columns will be obtained. Because we have 20 different columns, The final feature vector will be of length 8000 = 20 * 20 * 20. Figure 4: shows these steps. For example, in this figure for three marked rows and columns, the numbers obtained from the intersection of these rows and columns marked with a blue dotted circle around them, are multiplied to each other.
Usage of this feature in PSSMCOOL package:
as<-trigrame_pssm(paste0(system.file("extdata",package="PSSMCOOL"),"/C7GSI6.txt.pssm"))
head(as, n = 50)
## [1] 6.8369 3.7511 4.1531 2.6852 2.6406 4.3763 2.8829 4.2257 3.7228 5.1379
## [11] 5.2524 4.4484 5.1002 3.8583 5.1569 7.5018 5.2072 2.4309 2.8760 5.6669
## [21] 3.5204 3.0895 2.7871 1.9459 0.7295 3.4388 2.1569 2.4319 3.2735 1.6680
## [31] 1.9269 3.1770 2.1466 2.0473 3.1579 4.1613 2.8492 1.5444 1.8998 1.9281
## [41] 3.3102 2.3229 2.7297 1.6388 1.3118 3.0535 1.9647 2.2095 2.7645 2.8273
5 Pse-PSSM
The length of this feature vector is 320. The first 20 numbers of this feature vector are the mean of 20 columns in PSSM, and the next values for each column are the mean squares of the difference between the elements of row i and i + lag in this column. Because the lag value varies between 1 and 15, the final feature vector will have a length of 320. Figure 5: and following equation shows the process of this function and the corresponding mathematical equation, respectively.
Usage of this feature in PSSMCOOL package:
v<-pse_pssm(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(v, n = 50)
## [1] 0.3773 0.2965 0.3371 0.3049 0.1816 0.4356 0.3395 0.2866 0.3109 0.2576
## [11] 0.2622 0.3150 0.2776 0.3187 0.2383 0.3846 0.2755 0.1791 0.2180 0.2808
## [21] 0.1636 0.1969 0.1304 0.2129 0.1649 0.1271 0.1766 0.1524 0.1649 0.1478
## [31] 0.1197 0.1964 0.1259 0.1896 0.1973 0.1181 0.0903 0.1807 0.1477 0.1717
## [41] 0.1693 0.1875 0.1729 0.2508 0.1585 0.1495 0.1832 0.1635 0.1666 0.1338
6 k-Separated-bigram-PSSM
This feature is almost identical to the DPC feature, and in fact, the DPC feature is part of this feature (for k = 1) and for two different columns, it considers rows that have distance k.
Usage of this feature in PSSMCOOL package:
w<-k_seperated_bigrame(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"),5)
head(w, n = 50)
## [1] 17.8176 13.2148 15.5094 14.4319 8.3346 22.1273 16.1736 13.4963 13.7240
## [10] 11.5597 11.8734 14.7134 12.4310 17.7145 11.2181 18.3770 12.4118 8.3759
## [19] 10.2801 13.0504 14.3991 12.7802 13.3367 12.7504 6.5913 16.3527 13.2403
## [28] 9.4264 10.8272 8.9290 7.9045 12.3299 7.7908 8.3621 10.7674 15.1463
## [37] 10.1184 6.4232 6.2908 10.3672 14.7014 13.7350 16.3146 14.6568 7.6190
## [46] 20.1827 15.6412 13.8415 13.7924 8.5747
7 EDP-EEDP-MEDP
In this group of features, in order to use uniform dimensions to show proteins of different lengths, in the first step, the average evolutionary score between adjacent residues is calculated using the following equations:
Where L is equal to the length of the protein, is the mean score of positions i and i-1, and is the mean score of positions i and i+1. The evolutionary difference formula (EDF) is then defined as follows:
represents the mean of evolutionary difference between the residues of a given protein sequence. According to EDF, a given protein can be expressed by a 20 x 20 matrix called ED-PSSM, which is defined by the equations
Using this ED-PSSM, the three features EDP, EEDP, and MEDP are defined by the equations following equation, EDP has a length of 20, EEDP has a length of 400, and the MEDP feature is obtained by merging these two feature vectors.
Figure 7: also shows the process of this work. It is noteworthy that in the following equation the value of is removed during the subtraction of and , and therefore this value is not shown in Figure 7.
Usage of this feature in PSSMCOOL package:
as<-EDP_MEDP(paste0(system.file("extdata",package="PSSMCOOL"),"/C7GS61.txt.pssm"))
head(as, n = 50)
## [1] 0.0470 0.0270 0.0327 0.0310 0.0275 0.0284 0.0307 0.0380 0.0325 0.0425
## [11] 0.0394 0.0267 0.0329 0.0373 0.0334 0.0567 0.0558 0.0383 0.0350 0.0485
## [21] 0.0266 0.0521 0.0464 0.0549 0.0455 0.0409 0.0429 0.0520 0.0574 0.0428
## [31] 0.0425 0.0419 0.0354 0.0564 0.0492 0.0423 0.0436 0.0762 0.0549 0.0359
## [41] 0.0515 0.0095 0.0211 0.0154 0.0123 0.0171 0.0174 0.0290 0.0149 0.0347
8 AB-PSSM
This feature consists of two types of feature vectors. At first, each protein sequence is divided into 20 equal parts, each of which is called a block, and in each block, the row vectors of the PSSM related to that block are added together. The resulting final vector is divided by the length of that block, which is equal to 5% of protein length. Finally, by placing these 20 vectors side by side, the first feature vector of length 400 is obtained. The second feature for each amino acid in each column is the average of the positive numbers in that column and for each block, and these 20 values, corresponding to 20 blocks, are placed next to each other, and therefore for each of the 20 types of amino acids, a vector of length 20 is obtained, and by placing these together,the second feature vector of length 400, is obtained. Figure 8 represents this process.
Usage of this feature in PSSMCOOL package:
zz<- AB_PSSM(system.file("extdata","C7GRQ3.txt.pssm",package="PSSMCOOL"))
head(zz[1], n = 50)
## [[1]]
## [1] 0.3595 0.1202 0.3442 0.2908 0.0545 0.3715 0.3510 0.0816 0.1202 0.5049
## [11] 0.3361 0.3861 0.4256 0.2303 0.1354 0.5385 0.5939 0.0278 0.1983 0.4214
## [21] 0.6790 0.3226 0.6670 0.4566 0.1837 0.4365 0.5000 0.3562 0.4365 0.3546
## [31] 0.2157 0.4615 0.3340 0.0664 0.4495 0.7409 0.6774 0.0327 0.0963 0.3460
## [41] 0.4246 0.0571 0.3928 0.3598 0.1810 0.1769 0.3844 0.1568 0.1325 0.1810
## [51] 0.2183 0.1967 0.1040 0.2167 0.5664 0.6090 0.4246 0.0095 0.1126 0.2028
## [61] 0.3562 0.3961 0.2456 0.5310 0.0379 0.6522 0.7260 0.0519 0.2773 0.3794
## [71] 0.2448 0.6587 0.1332 0.1951 0.2488 0.5583 0.6236 0.1798 0.1836 0.3659
## [81] 0.4784 0.1351 0.2863 0.3325 0.4861 0.3323 0.3744 0.0590 0.1585 0.4641
## [91] 0.4197 0.3057 0.3361 0.2889 0.0308 0.4194 0.4896 0.0191 0.1836 0.4511
## [101] 0.6639 0.2440 0.5651 0.6509 0.1202 0.5000 0.6774 0.3460 0.2591 0.2591
## [111] 0.2992 0.4750 0.1821 0.0963 0.2076 0.7159 0.6540 0.0425 0.1083 0.2591
## [121] 0.3725 0.1707 0.5000 0.6774 0.2623 0.4864 0.6654 0.3611 0.5583 0.1668
## [131] 0.1332 0.3628 0.1701 0.3242 0.4116 0.5635 0.4230 0.2857 0.5197 0.2087
## [141] 0.4615 0.1772 0.4929 0.5136 0.0914 0.5000 0.4750 0.3057 0.1788 0.1837
## [151] 0.1701 0.4631 0.1452 0.2053 0.3731 0.6139 0.6188 0.2013 0.1322 0.3476
## [161] 0.2461 0.1869 0.4197 0.2167 0.3073 0.3979 0.3941 0.1986 0.3708 0.5298
## [171] 0.3996 0.3390 0.2960 0.1684 0.0547 0.3947 0.5120 0.0659 0.1739 0.4951
## [181] 0.5586 0.3356 0.3976 0.1937 0.0447 0.4760 0.5003 0.1960 0.2209 0.3008
## [191] 0.3614 0.5616 0.1582 0.0547 0.1665 0.4763 0.3812 0.0142 0.0278 0.3627
## [201] 0.3760 0.3426 0.2342 0.0959 0.0346 0.3812 0.2154 0.0451 0.3520 0.0891
## [211] 0.1357 0.6648 0.2956 0.2079 0.3536 0.5045 0.4763 0.0278 0.2264 0.0491
## [221] 0.3073 0.0240 0.2523 0.0343 0.3199 0.1380 0.0282 0.0207 0.2364 0.4197
## [231] 0.2254 0.0233 0.1234 0.2630 0.0231 0.4732 0.5395 0.1674 0.2817 0.4977
## [241] 0.5884 0.3346 0.3812 0.3947 0.1923 0.5404 0.5216 0.4313 0.3193 0.4583
## [251] 0.3861 0.4880 0.3210 0.1273 0.2737 0.5651 0.5385 0.0327 0.0833 0.4534
## [261] 0.6654 0.2575 0.6269 0.5635 0.1572 0.5000 0.6405 0.6774 0.5770 0.2190
## [271] 0.2076 0.5000 0.2326 0.1468 0.5385 0.7560 0.6155 0.0474 0.1322 0.2960
## [281] 0.3210 0.4029 0.5071 0.3682 0.4511 0.2206 0.2542 0.2610 0.4145 0.2639
## [291] 0.2704 0.3611 0.2430 0.1635 0.1684 0.5838 0.3845 0.0327 0.1582 0.2618
## [301] 0.5169 0.1037 0.0756 0.1557 0.1458 0.0888 0.0418 0.3445 0.2564 0.2096
## [311] 0.1406 0.0717 0.1231 0.3712 0.0299 0.6062 0.3536 0.2284 0.3184 0.2779
## [321] 0.1866 0.2329 0.5026 0.5224 0.0884 0.2542 0.4347 0.2264 0.1582 0.0247
## [331] 0.0146 0.4253 0.0205 0.0059 0.1610 0.2759 0.3356 0.0040 0.0156 0.1360
## [341] 0.1701 0.1836 0.1869 0.0341 0.3225 0.2553 0.1133 0.0191 0.5418 0.3611
## [351] 0.4163 0.2128 0.2300 0.5189 0.0222 0.3562 0.3679 0.3881 0.6778 0.4246
## [361] 0.4864 0.2825 0.6524 0.7289 0.0425 0.4615 0.7611 0.7645 0.2342 0.0963
## [371] 0.0714 0.6654 0.1083 0.0628 0.3595 0.5000 0.3980 0.0964 0.1034 0.1023
## [381] 0.2007 0.2046 0.1458 0.1235 0.2044 0.1362 0.1407 0.0557 0.2154 0.6054
## [391] 0.4998 0.1505 0.5000 0.3558 0.0226 0.2677 0.2905 0.1722 0.3651 0.5821
9 AATP-TPC
In this feature, at first, a TPM matrix is constructed from the PSSM, which has represented by a vector corresponding to the following equation:
Where the components are as follows:
In the above equation, the numerator is the same as the equation related to DPC-PSSM feature without considering its coefficient. By placing these components together, a TPC feature vector of length 400 is obtained, and if we add the AAC feature vector of length 20 which is the average of columns of the PSSM to the beginning of this vector, AATP feature vector of length 420 is obtained.
Usage of this feature in PSSMCOOL package:
as<-AATP_TPCC(paste0(system.file("extdata",package="PSSMCOOL"),"/C7GQS7.txt.pssm"))
head(as, n = 50)
## [1] 0.3773 0.2965 0.3371 0.3049 0.1816 0.4356 0.3395 0.2866 0.3109 0.2576
## [11] 0.2622 0.3150 0.2776 0.3187 0.2383 0.3846 0.2755 0.1791 0.2180 0.2808
## [21] 0.0640 0.0510 0.0580 0.0445 0.0332 0.0739 0.0517 0.0448 0.0590 0.0442
## [31] 0.0454 0.0517 0.0451 0.0567 0.0359 0.0683 0.0492 0.0344 0.0393 0.0496
## [41] 0.0706 0.0528 0.0645 0.0507 0.0237 0.0710 0.0559 0.0521 0.0521 0.0425
10 CS-PSe-PSSM
This feature consists of a combination of several types of features, and in general, the obtained feature vector would be of length 700. Here all parts of this feature are described separately.
10.1 CSAAC
First, the consensus sequence is obtained from the PSSM according to the following equation, then from this consensus sequence next feature vectors are obtained
where is the index for the largest element in row i of PSSM, and the ith component in the consensus sequence equal to the 'th amino acid in the standard amino acid alphabet which the column names of PSSM are labeled by them. Now, using the following equation, the feature vector of length 20 is obtained:
Here shows the number of j-th amino acid occurrences in the consensus sequence
10.2 CSCM
This feature vector is obtained using the following equation from the consensus sequence:
Here shows the number of i-th amino acid occurrences in the consensus sequence and indicates the j-th position of the i-th amino acid in the consensus sequence.
10.3 Segmented PsePSSM features
Here the PSSM is divided into n segments, which corresponds to dividing the initial protein sequence into n segments, if n = 2:
Where L represents the length of the initial protein and indexed L's indicate the length of the first and second segments, respectively. Now, using the following equations, the feature vector components of length 200 are obtained
And if n = 3 then we have:
Therefore, using the following equations, the components of a feature vector with length 180 are obtained as follows:
In total, using the previous feature vector, a feature vector of length 380 is obtained for this group.
10.4 Segmented ACTPSSM features
In this group, using the previous equations and the following equations, the feature vector of length 280 could be obtained. when n=2:
when n=3:
If we connect all these feature vectors together, we will get a feature vector with a length of 700, which is reduced by PCA method and is used as input for the support vector machine classifier.
Usage of this feature in PSSMCOOL package:
A<-CS_PSe_PSSM(system.file("extdata", "C7GSI6.txt.pssm", package="PSSMCOOL"),"total")
head(A, n = 50)
## [1] 0.0833 0.0278 0.0694 0.0139 0.0556 0.0139 0.0000 0.0278 0.0139 0.1250
## [11] 0.0833 0.0417 0.0556 0.0556 0.1111 0.0833 0.0278 0.0139 0.0417 0.0556
## [21] 0.0350 0.0139 0.0299 0.0082 0.0401 0.0104 0.0000 0.0211 0.0012 0.0841
## [31] 0.0401 0.0335 0.0280 0.0411 0.0149 0.0327 0.0125 0.0121 0.0196 0.0358
## [41] 0.4777 0.2352 0.2671 0.1705 0.1904 0.2805 0.2288 0.2839 0.2522 0.3366
11 D-FPSSM/S-FPSSM
If we sum the numbers of each column in the PSSM, we get a feature vector of length 20 as follows:
If we remove the negative elements somehow of the PSSM and call the resulting new matrix FPSSM and then calculate this feature vector for FPSSM, the components of this vector will depend on the length of the original protein, so to eliminate this dependency, we normalize the components of this vector using the following equation:
Where, min and max represent the smallest and largest values of the previous vector components, respectively, and L represents the length of the original protein. The second feature vector with length 400 is obtained as follows:
If we name the columns of the FPSSM from to in the order from left to right, then is equal to the sum of those members in the j-th column in the FPSSM whose corresponding row amino acid is equal to . Figure 9 schematically shows these steps:
Usage of this feature in PSSMCOOL package:
q<-FPSSM(system.file("extdata","C7GQS7.txt.pssm",package="PSSMCOOL"),20)
head(q, n = 50)
## [[1]]
## [1] 62 65 48 87 105 124 61 73 75 49 63 74 48 116 83 53 21 80 55
## [20] 54
##
## [[2]]
## [1] 14 6 1 2 5 5 0 7 7 1 2 3 1 6 2 2 2 0 0 1 3 11 2 0 2
## [26] 0 0 0 3 0 1 1 1 0 0 1 1 0 1 0 9 1 11 12 0 3 1 14 0 1
## [51] 0 1 0 9 4 5 2 0 0 3 3 4 4 41 0 7 14 1 6 0 3 5 0 0 0
## [76] 3 0 0 0 0 0 0 0 0 87 0 0 0 0 0 0 0 0 5 0 2 0 0 0 0
## [101] 3 10 6 8 0 52 12 6 15 0 1 2 10 7 21 10 1 1 0 0 0 1 0 3 0
## [126] 4 5 4 0 0 0 0 0 0 1 0 0 1 0 0 6 0 7 2 1 4 2 23 2 2
## [151] 1 4 4 5 0 0 0 6 0 1 0 2 4 1 0 16 3 3 17 0 0 8 5 1 0
## [176] 0 0 0 0 0 1 0 0 0 1 0 0 0 0 16 9 6 4 4 0 0 0 1 0 9
## [201] 2 0 0 5 2 0 5 0 0 11 23 0 4 2 0 0 0 0 1 6 2 19 4 5 6
## [226] 7 10 3 7 0 0 40 0 0 0 5 2 6 2 0 0 0 0 0 0 0 0 0 0 0
## [251] 3 0 9 0 0 0 0 0 0 0 5 2 1 0 1 8 0 1 6 4 14 0 6 46 0
## [276] 0 1 12 11 4 5 1 1 7 0 5 0 10 3 0 0 0 0 1 52 1 0 7 1 0
## [301] 5 3 4 0 0 5 4 0 3 0 1 3 1 1 1 20 1 9 1 3 4 0 0 1 0
## [326] 1 1 1 1 0 0 0 0 5 1 3 11 2 2 0 0 0 0 0 0 0 0 0 0 2
## [351] 1 0 0 7 0 0 0 16 2 0 0 0 0 0 0 2 0 0 2 2 0 0 0 9 0
## [376] 0 0 16 30 3 0 5 3 0 0 5 4 0 3 10 4 1 3 8 1 1 0 3 4 24
12 SCSH2
To generate this feature vector, the consensus sequence corresponding to the protein sequence is extracted using the PSSM. Then, by placing these two sequences next to each other, a matrix with dimensions of 2 * L will be created. In the next step, each component in the upper row of this matrix is connected to two components in the lower row of this matrix, and thus a graph similar to a bipartite graph could be created. Now in this graph, each path of length 2 specifies a 3-mer and each path of length 1 denotes a 2-mer corresponding to these two sequences. Now if we consider a table consisting of two rows and 8000 columns so that the first row contains all possible 3-mers of 20 amino acids, then for every 3-mer obtained from this graph, we put number 1 below the corresponding cell With that 3-mer in the aforementioned table and 0 in other cells. so This gives us a vector of length 8000. For the 2-mers obtained from this graph, a vector of length 400 is obtained in a similar way. figures 10, 11 show these processes.
Usage of this feature in PSSMCOOL package:
zz<- scsh2(system.file("extdata","C7GRQ3.txt.pssm",package="PSSMCOOL"),2)
head(zz, n = 200)
## [1] 1 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0
## [38] 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 1
## [75] 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 1
## [112] 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 1
## [149] 1 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 0
## [186] 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1
13 RPSSM
The indices will show the standard 20 amino acids. If we assume that our primary protein has length L, each of the above columns is as follows:
Now, using the following equations, we merge the columns of the PSSM and obtain a matrix with dimensions :
Now using this new matrix we get a feature vector of length 10 as follows:
where
now using following equations; will create a feature vector of length 100 and by combining the feature vector of length 10 mentioned previously, the final feature vector of length 110 will be created.
Usage of this feature in PSSMCOOL package:
w<-rpssm(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(w, n = 50)
## [1] 0.0620 0.0528 0.0646 0.0674 0.0806 0.0810 0.0883 0.0824 0.0988 0.0950
## [11] 0.0621 0.0482 0.0640 0.0575 0.0896 0.1044 0.0888 0.0752 0.1037 0.1039
## [21] 0.0787 0.0618 0.0731 0.0671 0.0982 0.1050 0.0912 0.0805 0.1200 0.1085
## [31] 0.0559 0.0589 0.0685 0.0373 0.0523 0.0643 0.0633 0.0727 0.0819 0.0853
## [41] 0.0693 0.0864 0.1022 0.0645 0.0493 0.0570 0.0811 0.1101 0.0714 0.0823
14 CC-PSSM
Usage of this feature in PSSMCOOL package:
aa<-pssm_cc(system.file("extdata","C7GQS7.txt.pssm",package="PSSMCOOL"),18)
head(aa, n = 50)
## [1] 0.0150 0.0009 -0.0080 -0.0197 0.0277 -0.0021 0.0034 0.0222 -0.0033
## [10] -0.0087 0.0161 -0.0024 -0.0144 0.0161 0.0050 -0.0056 -0.0024 -0.0087
## [19] -0.0165 0.0004 -0.0035 0.0101 0.0057 -0.0256 -0.0114 -0.0009 -0.0179
## [28] 0.0027 -0.0024 -0.0127 -0.0113 -0.0155 -0.0056 -0.0020 0.0040 -0.0121
## [37] -0.0061 0.0026 -0.0045 0.0110 0.0048 0.0076 -0.0167 -0.0103 -0.0117
## [46] 0.0049 -0.0052 -0.0062 0.0077 -0.0117
15 Discrete cosine transform
Usage of this feature in PSSMCOOL package:
as<-Discrete_Cosine_Transform(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(as, n = 50)
## [1] 3.7586 -0.8904 -1.8115 0.9043 1.3727 -1.3320 -0.1494 0.4772 0.3272
## [10] 0.2038 -0.3245 -0.9033 1.0473 0.2255 -1.2477 0.3651 1.1561 -0.8085
## [19] -0.7821 1.1143 9.4695 -1.3575 -0.4747 0.3296 1.2757 0.6152 -0.9732
## [28] 1.0494 -0.5279 0.2443 -0.7019 -0.8720 -1.4273 0.1932 -0.0210 0.5049
## [37] -0.8403 -1.4087 0.6963 -0.1914 6.2212 -0.6490 -0.3823 0.3074 0.6949
## [46] -0.7493 -0.3240 1.9688 -0.0276 0.2205
16 Discrete Wavelet Transform
Usage of this feature in PSSMCOOL package:
as<-dwt_PSSM(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(as, n = 50)
## [1] -1.2032 2.1352 0.0547 0.4630 -0.9770 2.0423 0.0309 0.4129 -0.7134
## [10] 2.3055 0.0884 0.4222 -0.6967 1.7063 0.0818 0.4345 -0.7764 1.0817
## [19] 0.0411 0.3327 -1.2524 3.9385 0.0943 0.5317 -0.8283 2.3753 0.0723
## [28] 0.4349 -0.8160 2.0562 0.0561 0.4058 -0.7764 2.6698 0.0822 0.4252
## [37] -0.6910 2.2393 0.0964 0.3850 -0.7962 2.7484 0.0878 0.4019 -0.6288
## [46] 1.9567 0.0677 0.4157 -0.7807 1.9298
17 Disulfide_PSSM
Usage of this feature in PSSMCOOL package:
aq<-disulfid(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(aq[,1:50])
## 1 2 3 4 5 6 7 8 9 10 11
## 1 c1c2 0.2689 0.982 0.7311 0.1192 0.018 0.9526 0.7311 0.0474 0.8808 0.0474
## 2 c1c3 0.2689 0.982 0.7311 0.1192 0.018 0.9526 0.7311 0.0474 0.8808 0.0474
## 3 c1c4 0.2689 0.982 0.7311 0.1192 0.018 0.9526 0.7311 0.0474 0.8808 0.0474
## 4 c1c5 0.2689 0.982 0.7311 0.1192 0.018 0.9526 0.7311 0.0474 0.8808 0.0474
## 5 c1c6 0.2689 0.982 0.7311 0.1192 0.018 0.9526 0.7311 0.0474 0.8808 0.0474
## 6 c1c7 0.2689 0.982 0.7311 0.1192 0.018 0.9526 0.7311 0.0474 0.8808 0.0474
## 12 13 14 15 16 17 18 19 20 21 22 23
## 1 0.1192 0.7311 0.5 0.1192 0.1192 0.7311 0.5 0.018 0.1192 0.0474 0.5 0.2689
## 2 0.1192 0.7311 0.5 0.1192 0.1192 0.7311 0.5 0.018 0.1192 0.0474 0.5 0.2689
## 3 0.1192 0.7311 0.5 0.1192 0.1192 0.7311 0.5 0.018 0.1192 0.0474 0.5 0.2689
## 4 0.1192 0.7311 0.5 0.1192 0.1192 0.7311 0.5 0.018 0.1192 0.0474 0.5 0.2689
## 5 0.1192 0.7311 0.5 0.1192 0.1192 0.7311 0.5 0.018 0.1192 0.0474 0.5 0.2689
## 6 0.1192 0.7311 0.5 0.1192 0.1192 0.7311 0.5 0.018 0.1192 0.0474 0.5 0.2689
## 24 25 26 27 28 29 30 31 32 33 34 35
## 1 0.8808 0.1192 0.0474 0.1192 0.0474 0.0474 0.2689 0.5 0.5 0.1192 0.5 0.8808
## 2 0.8808 0.1192 0.0474 0.1192 0.0474 0.0474 0.2689 0.5 0.5 0.1192 0.5 0.8808
## 3 0.8808 0.1192 0.0474 0.1192 0.0474 0.0474 0.2689 0.5 0.5 0.1192 0.5 0.8808
## 4 0.8808 0.1192 0.0474 0.1192 0.0474 0.0474 0.2689 0.5 0.5 0.1192 0.5 0.8808
## 5 0.8808 0.1192 0.0474 0.1192 0.0474 0.0474 0.2689 0.5 0.5 0.1192 0.5 0.8808
## 6 0.8808 0.1192 0.0474 0.1192 0.0474 0.0474 0.2689 0.5 0.5 0.1192 0.5 0.8808
## 36 37 38 39 40 41 42 43 44 45 46 47
## 1 0.2689 0.5 0.2689 0.9526 0.982 0.8808 0.2689 0.1192 0.2689 0.9526 0.0474 0.5
## 2 0.2689 0.5 0.2689 0.9526 0.982 0.8808 0.2689 0.1192 0.2689 0.9526 0.0474 0.5
## 3 0.2689 0.5 0.2689 0.9526 0.982 0.8808 0.2689 0.1192 0.2689 0.9526 0.0474 0.5
## 4 0.2689 0.5 0.2689 0.9526 0.982 0.8808 0.2689 0.1192 0.2689 0.9526 0.0474 0.5
## 5 0.2689 0.5 0.2689 0.9526 0.982 0.8808 0.2689 0.1192 0.2689 0.9526 0.0474 0.5
## 6 0.2689 0.5 0.2689 0.9526 0.982 0.8808 0.2689 0.1192 0.2689 0.9526 0.0474 0.5
## 48 49 50
## 1 0.8808 0.0474 0.2689
## 2 0.8808 0.0474 0.2689
## 3 0.8808 0.0474 0.2689
## 4 0.8808 0.0474 0.2689
## 5 0.8808 0.0474 0.2689
## 6 0.8808 0.0474 0.2689
18 DP-PSSM
Usage of this feature in PSSMCOOL package:
ss<-DP_PSSM(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"))
head(ss, n = 50)
## [1] 1.0739 -0.4023 1.1132 -0.5915 0.9459 -0.5134 1.3519 -0.6148 2.1035
## [10] -0.7794 1.1112 -0.5070 0.9216 -0.5843 1.3479 -0.5950 1.2737 -0.5028
## [19] 1.1349 -0.6704 1.3017 -0.6891 1.1585 -0.5086 0.9554 -0.5457 1.4547
## [28] -0.7166 1.5252 -0.6998 1.0084 -0.3219 0.6664 -0.4227 1.3648 -0.7855
## [37] 1.0723 -0.6254 1.1957 -0.6171 1.7783 -1.7591 1.5965 -1.6100 1.1498
## [46] -1.5052 1.4737 -1.5524 1.5066 -1.6937
19 DFMCA-PSSM
Usage of this feature in PSSMCOOL package:
as<-DFMCA_PSSM(system.file("extdata", "C7GQS7.txt.pssm", package="PSSMCOOL"),7)
head(as, n = 50)
## [1] 1.5671 1.0027 1.3320 1.1677 0.5471 2.3738 1.3395 1.1099 1.2298 1.0010
## [11] 1.1077 1.0803 0.9034 1.5549 0.8586 1.5949 0.8651 0.4748 0.5141 1.0397
## [21] 1.0206 1.1935 1.0423 0.7185 1.5056 1.1396 1.0414 1.0555 0.9881 0.9916
## [31] 1.0804 0.9890 1.1909 0.9159 1.4816 1.0687 0.6386 0.7624 1.0960 1.0048
## [41] 0.8881 0.4023 1.2723 1.0321 0.7996 0.8973 0.5799 0.5536 0.9635 0.6525