Learn R Programming

PepPrep (version 1.1.0)

snvToPepFasta: Single nucleotide variant (SNV) to peptide workflow

Description

This is a wrapper for the whole computing of SNV mutations into transcripts, digest these transcripts into small peptides and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

Usage

snvToPepFasta(tbl, glst, mymart, myarchive, spath, tpath, width = 60,
intermediate = FALSE, target = "K|R", exception = "P")

Arguments

tbl
Data.frame of ANNOVAR annotated SNVs.
glst
Data.frame of gennames, column Genes.
mymart
Mart to retrieve the ENST from via biomaRt.
myarchive
Logical that indicates if a archive mart is given, (default FALSE).
spath
Character string giving the path to HUMAN Ensemble peptide database in FASTA.
tpath
Character string giving the path where to write the mutated and digested sequences in FASTA format.
width
Width of the sequence in the result (default 60).
intermediate
Logical, TRUE if you would like to have intermediate output, FALSE if not (default).
target
Character string, pattern to be matched before the cleavage site (default "K|R").
exception
Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

Value

  • If you set intermediate to TRUE you will get the following output:
  • aachangesA data.frame like tbl, with new columns that describe the aminoacid changes.
  • transcriptsData.frame, containing: ensemble_transcript_id, nmid and pname.
  • mutfastaCharacter vector that contains FASTA headers and peptide sequences.
  • mutlogCharacter vector contains log entries of errors reported during mutation (mutateProtToPep()).
  • Otherwise just a character vector where to find the FASTA file or an error message.

Details

The Refseq mRNA ID NM_ID will be used by biomaRt to querry the Ensemble transcript ID (ENST). http://www.ncbi.nlm.nih.gov/refseq/ The header of the FASTA file will look like this: >ENST|description| originalAminoacid->mutatedAminoacid_positionAminoacid ... If the annotated change does not fit to the ENST it will look like: wrong: originalAminoacid->mutatedAminoacid_positionAminoacid If the ENST matches two or more NM_IDs, there will be a counter in the header: >ENSTxcounter|... Trypsination rule: cut after K and R except when followed by P You can use target and exception to set other rules for digestion. The patterns for target and exception are restricted to one aminoacid. Aminoacids: ARNDCQEGHILKMFPSTWYV valid patterns: A|R|W|H, P|S invalid patterns: Z|F|A|D, AR|NDC|STW The analysis is based on Ensembl proteindata: http://www.ensembl.org/index.html The SNVs annotation has to look like ANNOVAR: http://www.openbioinformatics.org/annovar/

Examples

Run this code
#load data and set arguments

#data.frame with SNVs
tbl <- system.file("extdata", "ExampleData.RData", package="PepPrep")
load(tbl)

glst <- data.frame(Genes="CAP1", stringsAsFactors=FALSE)

#peptide sequence
spath <- system.file("extdata", "ExampleHomo_sapiens.GRCh37.70.pep.all.fa", package="PepPrep")

#where to write the result and how to write
tpath <- paste0(getwd(), "/myTest_snvToPep.fasta")
width <- 60

#biomaRt settings
mymart <- "ensembl"
myarchive <- FALSE

#call workflow
test <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath,width)
test2 <- snvToPepFasta(testtbl, glst, mymart, myarchive, spath, tpath, width, intermediat= TRUE)

Run the code above in your browser using DataLab