Last chance! 50% off unlimited learning
Sale ends in
This function calculates the profile-based protein representation
derived by PSSM. The feature vector is based on the PSSM computed by
extractPSSM
.
extractPSSMFeature(pssmmat)
The PSSM computed by extractPSSM
.
A numeric vector which has 20 x N
named elements,
where N
is the size of the window (number of rows of the PSSM).
For a given sequence, the PSSM feature represents the log-likelihood of the substitution of the 20 types of amino acids at that position in the sequence.
Each PSSM feature value in the vector represents the degree of conservation of a given amino acid type. The value is normalized to interval (0, 1) by the transformation 1/(1+e^(-x)).
Ye, Xugang, Guoli Wang, and Stephen F. Altschul. "An assessment of substitution scores for protein profile-profile comparison." Bioinformatics 27.24 (2011): 3356--3363.
Rangwala, Huzefa, and George Karypis. "Profile-based direct kernels for remote homology detection and fold recognition." Bioinformatics 21.23 (2005): 4239--4247.
# NOT RUN {
if (Sys.which("makeblastdb") == "" | Sys.which("psiblast") == "") {
cat("Cannot find makeblastdb or psiblast. Please install NCBI Blast+")
} else {
x = readFASTA(system.file(
"protseq/P00750.fasta", package = "protr"))[[1]]
dbpath = tempfile("tempdb", fileext = ".fasta")
invisible(file.copy(from = system.file(
"protseq/Plasminogen.fasta", package = "protr"), to = dbpath))
pssmmat = extractPSSM(seq = x, database.path = dbpath)
pssmfeature = extractPSSMFeature(pssmmat)
head(pssmfeature)
}
# }
Run the code above in your browser using DataLab