A Synapter
object logs every operation that is applied to
it. When displayed with show
or when the name of the instance
is typed at the R console, the original input file names, all
operations and resulting the size of the respective data are
displayed. This allows the user to trace the effect of respective
operations. Loading the data{
The construction of the data and analysis container, technically
defined as an instance or object of class Synapter
, is created
with the Synapter
constructor. This function opens four dialog
boxes for the user to point to the input files,
namely (and in that order), the identification final peptide csv file, the
quantitation final peptide csv file and the quantitation Pep3D csv file (as exported
from the PLGS software) and the fasta file use for peptide
identification.
The files are read and the data is stored in the newly
created Synapter
instance. The file names can also be
specified as a named list with names 'identpeptide', 'quantpeptide'
and 'quantpep3d' respectively.
The final peptide files are filtered
to retain peptides with matchType
corresponding to
PepFrag1
and PepFrag2
, corresponding to unmodified
round 1 and 2 peptide identification. Other types, like
NeutralLoss_NH3
, NeutralLoss_H20
, InSource
,
MissedCleavage
or VarMod
are not considered in the rest
of the analysis. The quantitation Pep3D data is filtered to retain
Function
equal to 1
and unique quantitation spectrum ids,
i.e. unique entries for multiple charge states or isotopes of an EMRT
(exact mass-retention time features).
Then, p-values for Regular
peptides are computed based on
the Regular
and Random
database types score
distributions, as described in Käll{Kall} et al.,
2008a. Only unique peptide sequences are taken into account:
in case of duplicated peptides, only one entry is kept.
Empirical p-values are adjusted using Bonferroni
and Benjamini and Hochberg, 1995 (multtest
package)
and q-values are computed using the qvalue
package
(Storey JD and Tibshirani R., 2003 and Käll{Kall} et
al., 2008b). Only Regular
entries are stored in the
resulting data for subsequent analysis.
The data tables can be exported as csv
spreadsheets with the
writeIdentPeptides
and writeQuantPeptides
methods.
}
Filtering identification and quantitation peptide{
The first step of the analysis aims to match reliable peptide.
The final peptide datasets are
filtered based on the FDR (BH is default) using the
filterQuantPepScore
and filterIdentPepScore
methods. Several plots are provided to illustrate peptide score
densities (from which p-values are estimated, plotPepScores
;
use getPepNumbers
to see how many peptides were available) and
q-values (plotFdr
).
Peptides matching to multiple proteins in the fasta file (non-unique
tryptic identification and quantitation peptides) can be
discarded with the filterUniqueDbPeptides
method. One can
also filter on the peptide length using filterPeptideLength
.
Another filtering criterion is mass accuracy. Error tolerance
quantiles (in ppm, parts per million) can be visualised with the
plotPpmError
method. The values can be retrieved with
getPpmErrorQs
. Filtering is then done separately for
identification and quantitation peptide data using
filterIdentPpmError
and filterQuantPpmError
respectively. The previous plotting functions can be used again to
visualise the resulting distribution.
Filtering can also be performed at the level of protein false
positive rate, as computed by the PLGS application
(protein.falsePositiveRate
column), which counts the
percentage of decoy proteins that have been identified prior to the
regular protein of interest. This can be done with the
filterIdentProtFpr
and filterQuantProtFpr
methods.
Note that this field is erroneously called a false positive rate in
the PLGS software and the associated manuscript; it is a false
discovery rate.
}
Merging identification and quantitation peptides{
Common and reliable identification and quantitation peptides are
then matched based on their sequences and merged using the
mergePeptides
method.
}
Retention time modelling{
Systematic differences between identification features and
quantitation features retention times are modelled by
fitting a local regression (see the loess
function for
details), using the modelRt
method. The smoothing parameter,
or number of neighbour data points used the for local fit, is
controlled by the span
parameter that can be set in the above
method.
The effect of this parameter can be observed with the plotRt
method, specifying what = "data"
as parameters. The resulting
model can then be visualised with the above method specifying
what = "model"
, specifying up to 3 number of standard
deviations to plot. A histogram of retention time differences can
be produced with the plotRtDiffs
method.
Mention plotFeatures
here.
}
Grid search to optimise matching tolerances{
Matching of identification peptides and quantitation EMRTs is done
within a mass tolerance in parts per million (ppm) and the modelled
retention time +/- a certain number of standard deviations.
To help in the choice of these two parameters, a grid search over a
set of possible values is performed and performance metrics are
recorded, to guide in the selection of a 'best' pair of parameters.
The following metrics are computed:
(1) the percentage of identification
peptides that matched a single quantitation EMRT (called prcntTotal
),
(2) the percentage of identification peptides used in the retention time
model that matched the quantitation EMRT corresponding to the
correct quantitation peptide in ident/quant pair of the model
(called prcntModel
)
and
(3) the detailed about the matching of the features used for
modelling (accessible with getGridDetails
) and the
corresponding details
grid that reports the percentage of
correct unique assignments.
The detailed grid results specify the number of non
matched identification peptides (0), the number of correctly (1) or
wrongly (-1) uniquely matched identification peptides, the number of
identification peptides that matched 2 or more peptides including
(2+) or excluding (2-) the correct quantitation equivalent are also
available.
See the next section for additional details about how matching.
The search is performed with the searchGrid
method, possibly
on a subset of the data (see Methods and Examples sections for
further details).
The parameters used for matching can be set manually with
setPpmError
and setRtNsd
respectively, or using
setBestGridParams
to apply best parameters as defined using
the grid search. See example and method documentation for details.
}
Identification transfer: matching identification peptides and quantitation EMRTs{
The identification peptide - quantitation EMRT matching, termed
identification transfer, is performed using the best parameters, as
defined above with a grid search, or using user-defined parameters.
Matching is considered successful when one and only one EMRT is
found in the mass tolerance/retention time window defined by the
error ppm and number of retention time standard deviations
parameters. The values of uniquely matched EMRTs are reported in the
final matched dataframe that can be exported (see below). If
however, none or more than one EMRTs are matched, 0 or the number of
matches are reported.
As identification peptides are serially individually matched to 'close'
EMRTs, it is possible for peptides to be matched the same EMRT
independently. Such cases are reported as -1 in the results
dataframes.
The results can be assess using the plotEMRTtable
(or
getEMRTtable
to retrieve the values) and performace
methods. The former shows the number of identification peptides assigned to
none (0), exactly 1 (1) or more (> 2) EMRTs.
The latter method reports matched identification peptides, the number of
(q-value and protein FPR filtered) identification and quantitation peptides.
Matched EMRT and quantitation peptide numbers are then compared
calculating the synapter enrichment (100 * ( synapter - quant ) / quant)
and Venn counts.
}
Exporting and saving data{
The merged identification and quantitation peptides can be exported
to csv using the writeMergedPeptides
method. Similarly, the
matched identification peptides and quantitation EMRTs are exported
with writeMatchedEMRTs
.
Complete Synapter
instances can be serialised with
save
, as any R object, and reloaded with load
for
further analysis.
}