compHomToPepFasta: Comparison of proteins and creating homologous peptides workflow

Description

This is a wrapper for searching pairs of protein sequences by UniProt EntryName, digesting both sequences with trypsin, find homologous parts, remove duplicates, build a new sequence out of them and write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

Usage

compHomToPepFasta(path_o1, path_o2, path, width = 60, 
intermediate = FALSE, target = "K|R", exception = "P")

Arguments

path_o1

Character string indicating the path to a uniprot proteom FASTA database, for the first organism.

path_o2

Character string indicating the path to a uniprot proteom FASTA database, for the second organism.

path

Character string indicating the path where to write the resulting FASTA file.

width

Width of the sequence in the result (default 60).

intermediate

Logical, TRUE if you would like to have intermediate output, FALSE if not (default).

target

Character string, pattern to be matched before the cleavage site (default "K|R").

exception

Character string, pattern that avoids a cleavage when it can be found behind it. (default"P").

Value

If you set intermediate to TRUE you will get the following output:
tblA data.frame that contains the proteinpairs, the header and the homologous sequence.
fastaCharacter vector of the resulting FASTA file.
Otherwise just a character vector where to find the FASTA file or an error message.

Details

Searching pairs of protein sequences by UniProt EntryName in both organisms: Org1: Human >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3 Org2: Mouse >sp|Q9CQV8-2|1433B_MOUSE Isoform Short of 14-3-3 protein beta/alpha OS=Mus musculus GN=Ywhab >sp|Q9CQV8|1433B_MOUSE 14-3-3 protein beta/alpha OS=Mus musculus GN=Ywhab PE=1 SV=3 Pairs: P31946|1433B_HUMAN Q9CQV8-2|1433B_MOUSE P31946|1433B_HUMAN Q9CQV8|1433B_MOUSE Digesting both sequences with trypsin: Org1: >sp|P31946|1433B_HUMAN ... MTMDKSELVQKAKLAEQAERYDDMAAAMK... Org2: >sp|Q9CQV8-2|1433B_MOUSE ... MDKSELVQKAKLAEQAERYDDMAAAMK... Find homologous parts, remove duplicates, build a new sequence out of them: Homolog Org1Org2: >sp|P31946|1433B_HUMAN ... org2:sp|Q9CQV8-2|1433B_MOUSE ... SELVQKAKLAEQAERYDDMAAAMK... Write the result into a FASTA file, that can be used for further analysis (e.g. compare to mass spectrometry results).

You can use target and exception to set other rules for digestion. The patterns for target and exception are restricted to one aminoacid. Aminoacids: ARNDCQEGHILKMFPSTWYV valid patterns: A|R|W|H, P|S invalid patterns: Z|F|A|D, AR|NDC|STW UniProt, the source of the proteoms: http://www.uniprot.org/

Examples

Run this code

#load data and set arguments

#Uniprot proteom FASTA databases 
#(just a small example with two proteins each)
path_o1 <- system.file("extdata", "ExampleHumanProt.fasta", package="PepPrep")
path_o2 <- system.file("extdata", "ExampleMouseProt.fasta", package="PepPrep")

#where to write the result and how to formate
path <- paste0(getwd(), "/myTest_compHomToPep.fasta")
width <- 60

#call workflow
test <- compHomToPepFasta(path_o1, path_o2, path, width)
test <- compHomToPepFasta(path_o1, path_o2, path, width, intermediate=TRUE)

Run the code above in your browser using DataLab