This function calculates the Pseudo Amino Acid Composition (PseAAC)
descriptor (dim: 20 + lambda
, default is 50).
extractPAAC(
x,
props = c("Hydrophobicity", "Hydrophilicity", "SideChainMass"),
lambda = 30,
w = 0.05,
customprops = NULL
)
A length 20 + lambda
named vector
A character vector, as the input protein sequence.
A character vector, specifying the properties used. 3 properties are used by default, as listed below:
'Hydrophobicity'
Hydrophobicity value of the 20 amino acids
'Hydrophilicity'
Hydrophilicity value of the 20 amino acids
'SideChainMass'
Side-chain mass of the 20 amino acids
The lambda parameter for the PseAAC descriptors, default is 30.
The weighting factor, default is 0.05.
A n x 21
named data frame contains n
customized property. Each row contains one property.
The column order for different amino acid types is
'AccNo'
, 'A'
, 'R'
, 'N'
,
'D'
, 'C'
, 'E'
, 'Q'
,
'G'
, 'H'
, 'I'
, 'L'
,
'K'
, 'M'
, 'F'
, 'P'
,
'S'
, 'T'
, 'W'
, 'Y'
,
'V'
, and the columns should also be exactly named like this.
The AccNo
column contains the properties' names.
Then users should explicitly specify these properties
with these names in the argument props
.
See the examples below for a demonstration.
The default value for customprops
is NULL
.
Nan Xiao <https://nanx.me>
Kuo-Chen Chou. Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246-255.
Kuo-Chen Chou. Using Amphiphilic Pseudo Amino Acid Composition to Predict Enzyme Subfamily Classes. Bioinformatics, 2005, 21, 10-19.
JACS, 1962, 84: 4240-4246. (C. Tanford). (The hydrophobicity data)
PNAS, 1981, 78:3824-3828 (T.P.Hopp & K.R.Woods). (The hydrophilicity data)
CRC Handbook of Chemistry and Physics, 66th ed., CRC Press, Boca Raton, Florida (1985). (The side-chain mass data)
R.M.C. Dawson, D.C. Elliott, W.H. Elliott, K.M. Jones, Data for Biochemical Research 3rd ed., Clarendon Press Oxford (1986). (The side-chain mass data)
See extractAPAAC
for amphiphilic pseudo
amino acid composition descriptor.
x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
extractPAAC(x)
myprops <- data.frame(
AccNo = c("MyProp1", "MyProp2", "MyProp3"),
A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101),
N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59),
C = c(0.29, -1, 47), E = c(-0.74, 3, 73),
Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1),
H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57),
L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73),
M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91),
P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31),
T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130),
Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)
)
# use 3 default properties, 4 properties from the
# AAindex database, and 3 cutomized properties
extractPAAC(
x,
customprops = myprops,
props = c(
"Hydrophobicity", "Hydrophilicity", "SideChainMass",
"CIDH920105", "BHAR880101",
"CHAM820101", "CHAM820102",
"MyProp1", "MyProp2", "MyProp3"
)
)
Run the code above in your browser using DataLab