Parse fasta header (from UniProt) to extract different annotation fields
.parseFastaHeader(
header,
delim = "|",
databaseSign = c("sp", "tr", "generic", "conta", "synt", "gi"),
UniprSep = c("OS=", "OX=", "GN=", "PE=", "SV="),
asList = FALSE,
silent = FALSE,
callFrom = NULL,
debug = FALSE
)This function returns (depending on argument asList) a) a matrix with columns: 'db','uniqueIdentifier','entryName','proteinName' and further columns depending on argument UniprSep
of b) a list with matrix of primary parsing (argument delim) and matrix from further parsing (argument UniprSep)
(character) fasta-header
(character) delimeter (ie primary separator)
(character) characters at beginning right after the '>' (typically specifying the data-base-origin), they will be excluded from the sequance-header
(character) separators for further separating entry-fields if tableOut=TRUE; with these delimeter fields a space is assumed in addition to the separators;
see also UniProt-FASTA-headers
(logical) if asList=TRUE,the function returns a list with two matrixes, one for primary parsing and
another matrix for further parsing (using UniprSep), otherwise all will be combined in single matrix
(logical) suppress messages
(character) allows easier tracking of messages produced
(logical) supplemental messages for debugging
.parseFastaHeader(">sp|P00760|TRY1_BOVIN Serine protease 1 OS=Bos taurus OX=9913 GN=PRSS1 PE=1")
Run the code above in your browser using DataLab