A Synapter object logs every operation that is applied to
it. When displayed with show or when the name of the instance
is typed at the R console, the original input file names, all
operations and resulting the size of the respective data are
displayed. This allows the user to trace the effect of respective
operations. Loading the data{
The construction of the data and analysis container, technically
defined as an instance or object of class Synapter, is created
with the Synapter constructor. This function opens four dialog
boxes for the user to point to the input files,
namely (and in that order), the identification final peptide csv file, the
quantitation final peptide csv file and the quantitation Pep3D csv file (as exported
from the PLGS software) and the fasta file use for peptide
identification.
The files are read and the data is stored in the newly
created Synapter instance. The file names can also be
specified as a named list with names 'identpeptide', 'quantpeptide'
and 'quantpep3d' respectively.
The final peptide files are filtered
to retain peptides with matchType corresponding to
PepFrag1 and PepFrag2, corresponding to unmodified
round 1 and 2 peptide identification. Other types, like
NeutralLoss_NH3, NeutralLoss_H20, InSource,
MissedCleavage or VarMod are not considered in the rest
of the analysis. The quantitation Pep3D data is filtered to retain
Function equal to 1 and unique quantitation spectrum ids,
i.e. unique entries for multiple charge states or isotopes of an EMRT
(exact mass-retention time features).
Then, p-values for Regular peptides are computed based on
the Regular and Random database types score
distributions, as described in Käll{Kall} et al.,
2008a. Only unique peptide sequences are taken into account:
in case of duplicated peptides, only one entry is kept.
Empirical p-values are adjusted using Bonferroni
and Benjamini and Hochberg, 1995 (multtest package)
and q-values are computed using the qvalue package
(Storey JD and Tibshirani R., 2003 and Käll{Kall} et
al., 2008b). Only Regular entries are stored in the
resulting data for subsequent analysis.
The data tables can be exported as csv spreadsheets with the
writeIdentPeptides and writeQuantPeptides methods.
}
Filtering identification and quantitation peptide{
The first step of the analysis aims to match reliable peptide.
The final peptide datasets are
filtered based on the FDR (BH is default) using the
filterQuantPepScore and filterIdentPepScore
methods. Several plots are provided to illustrate peptide score
densities (from which p-values are estimated, plotPepScores;
use getPepNumbers to see how many peptides were available) and
q-values (plotFdr).
Peptides matching to multiple proteins in the fasta file (non-unique
tryptic identification and quantitation peptides) can be
discarded with the filterUniqueDbPeptides method. One can
also filter on the peptide length using filterPeptideLength.
Another filtering criterion is mass accuracy. Error tolerance
quantiles (in ppm, parts per million) can be visualised with the
plotPpmError method. The values can be retrieved with
getPpmErrorQs. Filtering is then done separately for
identification and quantitation peptide data using
filterIdentPpmError and filterQuantPpmError
respectively. The previous plotting functions can be used again to
visualise the resulting distribution.
Filtering can also be performed at the level of protein false
positive rate, as computed by the PLGS application
(protein.falsePositiveRate column), which counts the
percentage of decoy proteins that have been identified prior to the
regular protein of interest. This can be done with the
filterIdentProtFpr and filterQuantProtFpr methods.
Note that this field is erroneously called a false positive rate in
the PLGS software and the associated manuscript; it is a false
discovery rate.
}
Merging identification and quantitation peptides{
Common and reliable identification and quantitation peptides are
then matched based on their sequences and merged using the
mergePeptides method.
}
Retention time modelling{
Systematic differences between identification features and
quantitation features retention times are modelled by
fitting a local regression (see the loess function for
details), using the modelRt method. The smoothing parameter,
or number of neighbour data points used the for local fit, is
controlled by the span parameter that can be set in the above
method.
The effect of this parameter can be observed with the plotRt
method, specifying what = "data" as parameters. The resulting
model can then be visualised with the above method specifying
what = "model", specifying up to 3 number of standard
deviations to plot. A histogram of retention time differences can
be produced with the plotRtDiffs method.
Mention plotFeatures here.
}
Grid search to optimise matching tolerances{
Matching of identification peptides and quantitation EMRTs is done
within a mass tolerance in parts per million (ppm) and the modelled
retention time +/- a certain number of standard deviations.
To help in the choice of these two parameters, a grid search over a
set of possible values is performed and performance metrics are
recorded, to guide in the selection of a 'best' pair of parameters.
The following metrics are computed:
(1) the percentage of identification
peptides that matched a single quantitation EMRT (called prcntTotal),
(2) the percentage of identification peptides used in the retention time
model that matched the quantitation EMRT corresponding to the
correct quantitation peptide in ident/quant pair of the model
(called prcntModel)
and
(3) the detailed about the matching of the features used for
modelling (accessible with getGridDetails) and the
corresponding details grid that reports the percentage of
correct unique assignments.
The detailed grid results specify the number of non
matched identification peptides (0), the number of correctly (1) or
wrongly (-1) uniquely matched identification peptides, the number of
identification peptides that matched 2 or more peptides including
(2+) or excluding (2-) the correct quantitation equivalent are also
available.
See the next section for additional details about how matching.
The search is performed with the searchGrid method, possibly
on a subset of the data (see Methods and Examples sections for
further details).
The parameters used for matching can be set manually with
setPpmError and setRtNsd respectively, or using
setBestGridParams to apply best parameters as defined using
the grid search. See example and method documentation for details.
}
Identification transfer: matching identification peptides and quantitation EMRTs{
The identification peptide - quantitation EMRT matching, termed
identification transfer, is performed using the best parameters, as
defined above with a grid search, or using user-defined parameters.
Matching is considered successful when one and only one EMRT is
found in the mass tolerance/retention time window defined by the
error ppm and number of retention time standard deviations
parameters. The values of uniquely matched EMRTs are reported in the
final matched dataframe that can be exported (see below). If
however, none or more than one EMRTs are matched, 0 or the number of
matches are reported.
As identification peptides are serially individually matched to 'close'
EMRTs, it is possible for peptides to be matched the same EMRT
independently. Such cases are reported as -1 in the results
dataframes.
The results can be assess using the plotEMRTtable (or
getEMRTtable to retrieve the values) and performace
methods. The former shows the number of identification peptides assigned to
none (0), exactly 1 (1) or more (> 2) EMRTs.
The latter method reports matched identification peptides, the number of
(q-value and protein FPR filtered) identification and quantitation peptides.
Matched EMRT and quantitation peptide numbers are then compared
calculating the synapter enrichment (100 * ( synapter - quant ) / quant)
and Venn counts.
}
Exporting and saving data{
The merged identification and quantitation peptides can be exported
to csv using the writeMergedPeptides method. Similarly, the
matched identification peptides and quantitation EMRTs are exported
with writeMatchedEMRTs.
Complete Synapter instances can be serialised with
save, as any R object, and reloaded with load for
further analysis.
}