Takes a ThermoFisher MSF file and finds the location of each peptide within its corresponding protein sequence. In cases where a single peptide maps to multiple locations within a protein sequence, only the first location is reported. If a peptide maps ambiguously to multiple proteins, all locations are reported with data from each peptide-protein combination on a separate row.
map_peptides(msf_file, min_conf = "High", prot_regex = "")
A file path to a ThermoFisher MSF file.
"High", "Medium", or "Low". The minimum peptide confidence level to retrieve from MSF file.
Regular expression where the first group matches a protein name or ID from the protein description. Regex must contain ONE group. The protein description is typically generated from a fasta reference file that was used for the database search.
A dataframe containing start and stop positions (relative to the parent protein sequence) for each peptide in the database.
a unique peptide ID
a unique spectrum ID
unique protein group ID to which this peptide maps
protein description from reference database used to assign peptides to protein groups, parsed according to prot_regex
amino acid sequence (does not show post-translational modifications)
PEP score
Q-value score
parent protein sequence
start position of peptide within protein sequence
end position of peptide within protein sequence
# NOT RUN {
map_peptides(parsemsf_example("test_db.msf"))
# }
Run the code above in your browser using DataLab