snvToPepFasta: Single nucleotide variant (SNV) to peptide workflow

Description

This is a wrapper for the whole computing of SNV mutations into transcripts, digest these transcripts into small peptides and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

Usage

snvToPepFasta(tbl, glst, mymart, myarchive, spath, tpath, width = 60,
intermediate = FALSE, target = "K|R", exception = "P")

Arguments

tbl

Data.frame of ANNOVAR annotated SNVs.

glst

Data.frame of gennames, column Genes.

mymart

Mart to retrieve the ENST from via biomaRt.

myarchive

Logical that indicates if a archive mart is given, (default FALSE).

spath

Character string giving the path to HUMAN Ensemble peptide database in FASTA.

tpath

Character string giving the path where to write the mutated and digested sequences in FASTA format.

width

Width of the sequence in the result (default 60).

intermediate

Logical, TRUE if you would like to have intermediate output, FALSE if not (default).

target

Character string, pattern to be matched before the cleavage site (default "K|R").

exception

Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

Value

If you set intermediate to TRUE you will get the following output:
aachangesA data.frame like tbl, with new columns that describe the aminoacid changes.
transcriptsData.frame, containing: ensemble_transcript_id, nmid and pname.
mutfastaCharacter vector that contains FASTA headers and peptide sequences.
mutlogCharacter vector contains log entries of errors reported during mutation (mutateProtToPep()).
Otherwise just a character vector where to find the FASTA file or an error message.

Details

The Refseq mRNA ID NM_ID will be used by biomaRt to querry the Ensemble transcript ID (ENST). http://www.ncbi.nlm.nih.gov/refseq/ The header of the FASTA file will look like this: >ENST|description| originalAminoacid->mutatedAminoacid_positionAminoacid ... If the annotated change does not fit to the ENST it will look like: wrong: originalAminoacid->mutatedAminoacid_positionAminoacid If the ENST matches two or more NM_IDs, there will be a counter in the header: >ENSTxcounter|... Trypsination rule: cut after K and R except when followed by P You can use target and exception to set other rules for digestion. The patterns for target and exception are restricted to one aminoacid. Aminoacids: ARNDCQEGHILKMFPSTWYV valid patterns: A|R|W|H, P|S invalid patterns: Z|F|A|D, AR|NDC|STW The analysis is based on Ensembl proteindata: http://www.ensembl.org/index.html The SNVs annotation has to look like ANNOVAR: http://www.openbioinformatics.org/annovar/

Examples

Run this code

#load data and set arguments

#data.frame with SNVs
tbl <- system.file("extdata", "ExampleData.RData", package="PepPrep")
load(tbl)

glst <- data.frame(Genes="CAP1", stringsAsFactors=FALSE)

#peptide sequence
spath <- system.file("extdata", "ExampleHomo_sapiens.GRCh37.70.pep.all.fa", package="PepPrep")

#where to write the result and how to write
tpath <- paste0(getwd(), "/myTest_snvToPep.fasta")
width <- 60

#biomaRt settings
mymart <- "ensembl"
myarchive <- FALSE

#call workflow
test <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath,width)
test2 <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath, width, intermediat= TRUE)

Run the code above in your browser using DataLab