.parseFastaHeader

Parse fasta header (from <a href="https://www.uniprot.org">UniProt</a>) to extract different annotation fields

Data analysis of proteomics experiments by mass spectrometry is supported by this collection of functions mostly dedicated to the analysis of (bottom-up) quantitative (XIC) data.
Fasta-formatted proteomes (eg from UniProt Consortium <doi:10.1093/nar/gky1049>) can be read with automatic parsing and multiple annotation types (like species origin, abbreviated gene names, etc) extracted.
Initial results from multiple software for protein (and peptide) quantitation can be imported (to a common format):
MaxQuant (Tyanova et al 2016 <doi:10.1038/nprot.2016.136>), Dia-NN (Demichev et al 2020 <doi:10.1038/s41592-019-0638-x>),
Fragpipe (da Veiga et al 2020 <doi:10.1038/s41592-020-0912-y>), ionbot (Degroeve et al 2021 <doi:10.1101/2021.07.02.450686>),
MassChroq (Valot et al 2011 <doi:10.1002/pmic.201100120>),
OpenMS (Strauss et al 2021 <doi:10.1038/nmeth.3959>), ProteomeDiscoverer (Orsburn 2021 <doi:10.3390/proteomes9010015>),
Proline (Bouyssie et al 2020 <doi:10.1093/bioinformatics/btaa118>), AlphaPept (preprint Strauss et al <doi:10.1101/2021.07.23.453379>)
and Wombat-P (Bouyssie et al 2023 <doi:10.1021/acs.jproteome.3c00636>.
Meta-data provided by initial analysis software and/or in sdrf format can be integrated to the analysis.
Quantitative proteomics measurements frequently contain multiple NA values, due to physical absence of given peptides in some samples, limitations in sensitivity or other reasons.
Help is provided to inspect the data graphically to investigate the nature of NA-values via their respective replicate measurements
and to help/confirm the choice of NA-replacement algorithms.
Meta-data in sdrf-format (Perez-Riverol et al 2020 <doi:10.1021/acs.jproteome.0c00376>) or similar tabular formats can be imported and included.
Missing values can be inspected and imputed based on the concept of NA-neighbours or other methods.
Dedicated filtering and statistical testing using the framework of package 'limma' <doi:10.18129/B9.bioc.limma> can be run, enhanced by multiple rounds of NA-replacements to provide robustness towards rare stochastic events.
Multi-species samples, as frequently used in benchmark-tests (eg Navarro et al 2016 <doi:10.1038/nbt.3685>, Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>), can be run with special options considering
such sub-groups during normalization and testing. Subsequently, ROC curves (Hand and Till 2001 <doi:10.1023/A:1010920819831>) can be constructed to compare multiple analysis approaches.
As detailed example the data-set from Ramus et al 2016 <doi:10.1016/j.jprot.2015.11.011>) quantified by MaxQuant, ProteomeDiscoverer,
and Proline is provided with a detailed analysis of heterologous spike-in proteins.

Wolfgang Raffelsberger

wrProteo

Proteomics Data Analysis Functions

.parseFastaHeader function

<dl><dt>header</dt>
<dd>(character) fasta-header</dd>
<dt>delim</dt>
<dd>(character) delimeter (ie primary separator)</dd>
<dt>databaseSign</dt>
<dd>(character) characters at beginning right after the '&gt;' (typically specifying the data-base-origin), they will be excluded from the sequance-header</dd>
<dt>UniprSep</dt>
<dd>(character) separators for further separating entry-fields if <code>tableOut=TRUE</code>; with these delimeter fields a space is assumed in addition to the separators;
see also <a href="https://www.uniprot.org/help/fasta-headers">UniProt-FASTA-headers</a></dd>
<dt>asList</dt>
<dd>(logical) if <code>asList=TRUE</code>,the function returns a list with two matrixes, one for primary parsing and 
another matrix for further parsing (using <code>UniprSep</code>), otherwise all will be combined in single matrix</dd>
<dt>silent</dt>
<dd>(logical) suppress messages</dd>
<dt>callFrom</dt>
<dd>(character) allows easier tracking of messages produced</dd>
<dt>debug</dt>
<dd>(logical) supplemental messages for debugging</dd></dl>

Arguments

Parse fasta header (from <a href='https://www.uniprot.org'>UniProt</a>) to extract different annotation fields

Parse Fasta Header — .parseFastaHeader

<dl>

<dt>header</dt>
<dd>(character) fasta-header</dd>


<dt>delim</dt>
<dd>(character) delimeter (ie primary separator)</dd>


<dt>databaseSign</dt>
<dd>(character) characters at beginning right after the '&gt;' (typically specifying the data-base-origin), they will be excluded from the sequance-header</dd>


<dt>UniprSep</dt>
<dd>(character) separators for further separating entry-fields if <code>tableOut=TRUE</code>; with these delimeter fields a space is assumed in addition to the separators;
see also <a href='https://www.uniprot.org/help/fasta-headers'>UniProt-FASTA-headers</a></dd>


<dt>asList</dt>
<dd>(logical) if <code>asList=TRUE</code>,the function returns a list with two matrixes, one for primary parsing and 
another matrix for further parsing (using <code>UniprSep</code>), otherwise all will be combined in single matrix</dd>


<dt>silent</dt>
<dd>(logical) suppress messages</dd>


<dt>callFrom</dt>
<dd>(character) allows easier tracking of messages produced</dd>


<dt>debug</dt>
<dd>(logical) supplemental messages for debugging</dd>

</dl>

.parseFastaHeader: Parse Fasta Header

Description

Usage

Value

Arguments

See Also

Examples