Learn R Programming

wrProteo (version 1.5.0)

readFasta2: Read file of protein sequences in fasta format Read fasta formatted file (from UniProt) to extract (protein) sequences and name. If tableOut=TRUE output may be organized as matrix for separating meta-annotation (eg GeneName, OrganismName, ProteinName) in separate columns.

Description

Read file of protein sequences in fasta format

Read fasta formatted file (from UniProt) to extract (protein) sequences and name. If tableOut=TRUE output may be organized as matrix for separating meta-annotation (eg GeneName, OrganismName, ProteinName) in separate columns.

Usage

readFasta2(
  filename,
  delim = "|",
  databaseSign = c("sp", "tr", "generic", "gi"),
  tableOut = FALSE,
  UniprSep = c("OS=", "OX=", "GN=", "PE=", "SV="),
  cleanCols = TRUE,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)

Arguments

filename

(character) names fasta-file to be read

delim

(character) delimeter at header-line

databaseSign

(character) characters at beginning right afetr the '>' (typically specifying the data-base-origin), they will be excluded from the sequance-header

tableOut

(logical) toggle to return named character-vector or matrix with enhaced parsing of fasta-header. The resulting matrix will contain the comumns 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep

UniprSep

(character) separators for further separating entry-fields if tableOut=TRUE, see also UniProt-FASTA-headers

cleanCols

(logical) remove columns with all entries NA, if tableOut=TRUE

silent

(logical) suppress messages

callFrom

(character) allows easier tracking of message(s) produced

debug

(logical) supplemental messages for debugging

Value

return (based on 'tableOut') simple character vector (of sequence) with Uniprot ID as name or matrix with columns: 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep

See Also

scan or read.fasta from the package seqinr

Examples

Run this code
# NOT RUN {
# tiny example with common contaminants 
path1 <- system.file('extdata',package='wrProteo')
fiNa <-  "conta1.fasta"
fasta1 <- readFasta2(file.path(path1,fiNa))
## now let's read and further separate annotation-fields
fasta2 <- readFasta2(file.path(path1,fiNa),tableOut=TRUE)
str(fasta1)
# }

Run the code above in your browser using DataLab