Learn R Programming

Rcpi (version 1.8.0)

extractProtPSSM: Compute PSSM (Position-Specific Scoring Matrix) for given protein sequence

Description

Compute PSSM (Position-Specific Scoring Matrix) for given protein sequence

Usage

extractProtPSSM(seq, start.pos = 1L, end.pos = nchar(seq), psiblast.path = NULL, makeblastdb.path = NULL, database.path = NULL, iter = 5, silent = TRUE, evalue = 10L, word.size = NULL, gapopen = NULL, gapextend = NULL, matrix = "BLOSUM62", threshold = NULL, seg = "no", soft.masking = FALSE, culling.limit = NULL, best.hit.overhang = NULL, best.hit.score.edge = NULL, xdrop.ungap = NULL, xdrop.gap = NULL, xdrop.gap.final = NULL, window.size = NULL, gap.trigger = 22L, num.threads = 1L, pseudocount = 0L, inclusion.ethresh = 0.002)

Arguments

seq
Character vector, as the input protein sequence.
start.pos
Optional integer denoting the start position of the fragment window. Default is 1, i.e. the first amino acid of the given sequence.
end.pos
Optional integer denoting the end position of the fragment window. Default is nchar(seq), i.e. the last amino acid of the given sequence.
psiblast.path
Character string indicating the path of the psiblast program. If NCBI Blast+ was previously installed in the operation system, the path will be automatically detected.
makeblastdb.path
Character string indicating the path of the makeblastdb program. If NCBI Blast+ was previously installed in the system, the path will be automatically detected.
database.path
Character string indicating the path of a reference database (a FASTA file).
iter
Number of iterations to perform for PSI-Blast.
silent
Logical. Whether the PSI-Blast running output should be shown or not (May not work on some Windows versions and PSI-Blast versions), default is TRUE.
evalue
Expectation value (E) threshold for saving hits. Default is 10.
word.size
Word size for wordfinder algorithm. An integer >= 2.
gapopen
Integer. Cost to open a gap.
gapextend
Integer. Cost to extend a gap.
matrix
Character string. The scoring matrix name (default is 'BLOSUM62').
threshold
Minimum word score such that the word is added to the BLAST lookup table. A real value >= 0.
seg
Character string. Filter query sequence with SEG ('yes', 'window locut hicut', or 'no' to disable) Default is 'no'.
soft.masking
Logical. Apply filtering locations as soft masks? Default is FALSE.
culling.limit
An integer >= 0. If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit. Incompatible with best.hit.overhang and best_hit_score_edge.
best.hit.overhang
Best Hit algorithm overhang value (A real value >= 0 and =< 0.5, recommended value: 0.1). Incompatible with culling_limit.
best.hit.score.edge
Best Hit algorithm score edge value (A real value >=0 and =< 0.5, recommended value: 0.1). Incompatible with culling_limit.
xdrop.ungap
X-dropoff value (in bits) for ungapped extensions.
xdrop.gap
X-dropoff value (in bits) for preliminary gapped extensions.
xdrop.gap.final
X-dropoff value (in bits) for final gapped alignment.
window.size
An integer >= 0. Multiple hits window size, To specify 1-hit algorithm, use 0.
gap.trigger
Number of bits to trigger gapping. Default is 22.
num.threads
Integer. Number of threads (CPUs) to use in the BLAST search. Default is 1.
pseudocount
Integer. Pseudo-count value used when constructing PSSM. Default is 0.
inclusion.ethresh
E-value inclusion threshold for pairwise alignments. Default is 0.002.

Value

The original PSSM, a numeric matrix which has end.pos - start.pos + 1 columns and 20 named rows.

Details

This function calculates the PSSM (Position-Specific Scoring Matrix) derived by PSI-Blast for given protein sequence or peptides. For given protein sequences or peptides, PSSM represents the log-likelihood of the substitution of the 20 types of amino acids at that position in the sequence. Note that the output value is not normalized.

References

Altschul, Stephen F., et al. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic acids research 25.17 (1997): 3389--3402.

Ye, Xugang, Guoli Wang, and Stephen F. Altschul. "An assessment of substitution scores for protein profile-profile comparison." Bioinformatics 27.24 (2011): 3356--3363.

Rangwala, Huzefa, and George Karypis. "Profile-based direct kernels for remote homology detection and fold recognition." Bioinformatics 21.23 (2005): 4239--4247.

See Also

extractProtPSSMFeature extractProtPSSMAcc

Examples

Run this code

x = readFASTA(system.file('protseq/P00750.fasta', package = 'Rcpi'))[[1]]
dbpath = tempfile('tempdb', fileext = '.fasta')
invisible(file.copy(from = system.file('protseq/Plasminogen.fasta', package = 'Rcpi'), to = dbpath))
pssmmat = extractProtPSSM(seq = x, database.path = dbpath)
dim(pssmmat)  # 20 x 562 (P00750: length 562, 20 Amino Acids)

Run the code above in your browser using DataLab