Synapter: Class "Synapter"

Description

A reference class to store, manage and process Synapt G2 data to combine identification and quantitation results.

The data, intermediate and final results are stored together in such a ad-how container called a class. In the frame of the analysis of a set of 3 data files, namely as identification peptide, a quantitation peptide and a quantitation Pep3D, such a container is created and populated, updated according to the user's instructions and used to display and export results.

The functionality of the synapter package implemented in the Synapter class in described in the Details section below. Documentation for the individual methods is provided in the Methods section. Finally, a complete example of an analysis is provided in the Examples section, at the end of this document.

See also papers by Shliaha et al. for details about ion mobility separation and the manuscript describing the synapter methodology.

Usage

Synapter(filenames, master) ## creates an instance of class 'Synapter'

Arguments

filenames

A named list of file names to be load. The names must be 'identpeptide', 'quantpeptide', 'quantpep3d' and 'fasta'. If missing, dialog boxes pop up to select the four files manually. identpeptide can be a csv final peptide file (from PLGS) or a saved "MasterPeptides" data object as created by makeMaster if working with master peptide data. To serialise the "MasterPeptides" instance, use the saveRDS function, and file extenstion rds.

master

A logical that defines if the identification file is a master file. See makeMaster for details about this strategy.

encoding

UTF-8

Details

A Synapter object logs every operation that is applied to it. When displayed with show or when the name of the instance is typed at the R console, the original input file names, all operations and resulting the size of the respective data are displayed. This allows the user to trace the effect of respective operations.

Loading the data{ The construction of the data and analysis container, technically defined as an instance or object of class Synapter, is created with the Synapter constructor. This function opens four dialog boxes for the user to point to the input files, namely (and in that order), the identification final peptide csv file, the quantitation final peptide csv file and the quantitation Pep3D csv file (as exported from the PLGS software) and the fasta file use for peptide identification. The files are read and the data is stored in the newly created Synapter instance. The file names can also be specified as a named list with names 'identpeptide', 'quantpeptide' and 'quantpep3d' respectively.

The final peptide files are filtered to retain peptides with matchType corresponding to PepFrag1 and PepFrag2, corresponding to unmodified round 1 and 2 peptide identification. Other types, like NeutralLoss_NH3, NeutralLoss_H20, InSource, MissedCleavage or VarMod are not considered in the rest of the analysis. The quantitation Pep3D data is filtered to retain Function equal to 1 and unique quantitation spectrum ids, i.e. unique entries for multiple charge states or isotopes of an EMRT (exact mass-retention time features).

Then, p-values for Regular peptides are computed based on the Regular and Random database types score distributions, as described in Käll{Kall} et al., 2008a. Only unique peptide sequences are taken into account: in case of duplicated peptides, only one entry is kept. Empirical p-values are adjusted using Bonferroni and Benjamini and Hochberg, 1995 (multtest package) and q-values are computed using the qvalue package (Storey JD and Tibshirani R., 2003 and Käll{Kall} et al., 2008b). Only Regular entries are stored in the resulting data for subsequent analysis.

The data tables can be exported as csv spreadsheets with the writeIdentPeptides and writeQuantPeptides methods. }

Filtering identification and quantitation peptide{ The first step of the analysis aims to match reliable peptide. The final peptide datasets are filtered based on the FDR (BH is default) using the filterQuantPepScore and filterIdentPepScore methods. Several plots are provided to illustrate peptide score densities (from which p-values are estimated, plotPepScores; use getPepNumbers to see how many peptides were available) and q-values (plotFdr).

Peptides matching to multiple proteins in the fasta file (non-unique tryptic identification and quantitation peptides) can be discarded with the filterUniqueDbPeptides method. One can also filter on the peptide length using filterPeptideLength.

Another filtering criterion is mass accuracy. Error tolerance quantiles (in ppm, parts per million) can be visualised with the plotPpmError method. The values can be retrieved with getPpmErrorQs. Filtering is then done separately for identification and quantitation peptide data using filterIdentPpmError and filterQuantPpmError respectively. The previous plotting functions can be used again to visualise the resulting distribution.

Filtering can also be performed at the level of protein false positive rate, as computed by the PLGS application (protein.falsePositiveRate column), which counts the percentage of decoy proteins that have been identified prior to the regular protein of interest. This can be done with the filterIdentProtFpr and filterQuantProtFpr methods. Note that this field is erroneously called a false positive rate in the PLGS software and the associated manuscript; it is a false discovery rate. }

Merging identification and quantitation peptides{ Common and reliable identification and quantitation peptides are then matched based on their sequences and merged using the mergePeptides method. }

Retention time modelling{ Systematic differences between identification features and quantitation features retention times are modelled by fitting a local regression (see the loess function for details), using the modelRt method. The smoothing parameter, or number of neighbour data points used the for local fit, is controlled by the span parameter that can be set in the above method.

The effect of this parameter can be observed with the plotRt method, specifying what = "data" as parameters. The resulting model can then be visualised with the above method specifying what = "model", specifying up to 3 number of standard deviations to plot. A histogram of retention time differences can be produced with the plotRtDiffs method.

Mention plotFeatures here. }

Grid search to optimise matching tolerances{ Matching of identification peptides and quantitation EMRTs is done within a mass tolerance in parts per million (ppm) and the modelled retention time +/- a certain number of standard deviations. To help in the choice of these two parameters, a grid search over a set of possible values is performed and performance metrics are recorded, to guide in the selection of a 'best' pair of parameters.

The following metrics are computed: (1) the percentage of identification peptides that matched a single quantitation EMRT (called prcntTotal), (2) the percentage of identification peptides used in the retention time model that matched the quantitation EMRT corresponding to the correct quantitation peptide in ident/quant pair of the model (called prcntModel) and (3) the detailed about the matching of the features used for modelling (accessible with getGridDetails) and the corresponding details grid that reports the percentage of correct unique assignments. The detailed grid results specify the number of non matched identification peptides (0), the number of correctly (1) or wrongly (-1) uniquely matched identification peptides, the number of identification peptides that matched 2 or more peptides including (2+) or excluding (2-) the correct quantitation equivalent are also available.

See the next section for additional details about how matching. The search is performed with the searchGrid method, possibly on a subset of the data (see Methods and Examples sections for further details).

The parameters used for matching can be set manually with setPpmError and setRtNsd respectively, or using setBestGridParams to apply best parameters as defined using the grid search. See example and method documentation for details. }

Identification transfer: matching identification peptides and quantitation EMRTs{ The identification peptide - quantitation EMRT matching, termed identification transfer, is performed using the best parameters, as defined above with a grid search, or using user-defined parameters.

Matching is considered successful when one and only one EMRT is found in the mass tolerance/retention time window defined by the error ppm and number of retention time standard deviations parameters. The values of uniquely matched EMRTs are reported in the final matched dataframe that can be exported (see below). If however, none or more than one EMRTs are matched, 0 or the number of matches are reported.

As identification peptides are serially individually matched to 'close' EMRTs, it is possible for peptides to be matched the same EMRT independently. Such cases are reported as -1 in the results dataframes.

The results can be assess using the plotEMRTtable (or getEMRTtable to retrieve the values) and performace methods. The former shows the number of identification peptides assigned to none (0), exactly 1 (1) or more (> 2) EMRTs. The latter method reports matched identification peptides, the number of (q-value and protein FPR filtered) identification and quantitation peptides. Matched EMRT and quantitation peptide numbers are then compared calculating the synapter enrichment (100 * ( synapter - quant ) / quant) and Venn counts. }

Exporting and saving data{ The merged identification and quantitation peptides can be exported to csv using the writeMergedPeptides method. Similarly, the matched identification peptides and quantitation EMRTs are exported with writeMatchedEMRTs.

Complete Synapter instances can be serialised with save, as any R object, and reloaded with load for further analysis. }

References

Käll{Kall} L, Storey JD, MacCoss MJ, Noble WS Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res. 2008a Jan; 7:(1)40-4

Bonferroni single-step adjusted p-values for strong control of the FWER.

Benjamini Y. and Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B., 1995, Vol. 57: 289-300.

Storey JD and Tibshirani R. Statistical significance for genome-wide experiments. Proceedings of the National Academy of Sciences, 2003, 100: 9440-9445.

Käll{Kall}, Storey JD, MacCoss MJ, Noble WS Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res. 2008b Jan; 7:(1)29-34

Improving qualitative and quantitative performance for MSE-based label free proteomics, N.J. Bond, P.V. Shliaha, K.S. Lilley and L. Gatto, Journal of Proteome Research, 2013, in press.

The Effects of Travelling Wave Ion Mobility Separation on Data Independent Acquisition in Proteomics Studies, P.V. Shliaha, N.J. Bond, L. Gatto and K.S. Lilley, Journal of Proteome Research, 2013, in press.

Examples

Run this code

library(synapter) ## always needed

## (1) Construction - to create your own data objects
synapterTiny <- Synapter()

## let's use synapterTiny, shipped with the package
synapterTinyData() ## loads/prepares the data
synapterTiny ## show object

## (2) Filtering
## (2.1) Peptide scores and FDR

## visualise/explore peptide id scores
plotPepScores(synapterTiny)
getPepNumbers(synapterTiny)

## filter data
filterUniqueDbPeptides(synapterTiny) ## keeps unique proteotypic peptides
filterPeptideLength(synapterTiny, l = 7) ## default length is 7

## visualise before FDR filtering
plotFdr(synapterTiny)

setPepScoreFdr(synapterTiny, fdr = 0.01) ## optional
filterQuantPepScore(synapterTiny, fdr = 0.01) ## specifying FDR
filterIdentPepScore(synapterTiny) ## FDR not specified, using previously set value

## (2.2) Mass tolerance
getPpmErrorQs(synapterTiny)
plotPpmError(synapterTiny, what="Ident")
plotPpmError(synapterTiny, what="Quant")

setIdentPpmError(synapterTiny, ppm = 20) ## optional
filterQuantPpmError(synapterTiny, ppm = 20)
## setQuantPpmError(synapterTiny, ppm = 20) ## set quant ppm threshold below
filterIdentPpmError(synapterTiny, ppm=20)

filterIdentProtFpr(synapterTiny, fpr = 0.01)
filterQuantProtFpr(synapterTiny, fpr = 0.01)

getPpmErrorQs(synapterTiny) ## to be compared with previous output

## (3) Merge peptide sequences
mergePeptides(synapterTiny)

## (4) Retention time modelling
plotRt(synapterTiny, what="data")
setLowessSpan(synapterTiny, 0.05)
modelRt(synapterTiny) ## the actual modelling
getRtQs(synapterTiny)
plotRtDiffs(synapterTiny)
## plotRtDiffs(synapterTiny, xlim=c(-1, 1), breaks=500) ## pass parameters to hist()
plotRt(synapterTiny, what="model") ## using default nsd 1, 3, 5
plotRt(synapterTiny, what="model", nsd=0.5) ## better focus on model

plotFeatures(synapterTiny, what="all")
setRtNsd(synapterTiny, 3)     ## RtNsd and PpmError are used for detailed plot
setPpmError(synapterTiny, 10) ## if not set manually, default values are set automatically
plotFeatures(synapterTiny, what="some", xlim=c(36,44), ylim=c(1161.4, 1161.7))
## best plotting to svg for zooming

set.seed(1) ## only for reproducibility of this example

## (5) Grid search to optimise EMRT matching parameters
searchGrid(synapterTiny,
           ppms = 7:10,  ## default values are 5, 7, ..., 20
           nsds = 1:3,   ## default values are 0.5, 1,  ..., 5
           subset = 0.2) ## default is 1
## alternatively, use 'n = 1000' to use exactly
## 1000 randomly selected features for the grid search
getGrid(synapterTiny)  ## print the grid
getGridDetails(synapterTiny)  ## grid details
plotGrid(synapterTiny, what = "total")   ## plot the grid for total matching
plotGrid(synapterTiny, what = "model")   ## plot the grid for matched modelled feature
plotGrid(synapterTiny, what = "details") ## plot the detail grid
getBestGridValue(synapterTiny)  ## return best grid values
getBestGridParams(synapterTiny) ## return parameters corresponding to best values
setBestGridParams(synapterTiny, what = "auto") ## sets RtNsd and PpmError according the grid results
## 'what' could also be "model", "total" or "details"
## setPpmError(synapterTiny, 12) ## to manually set values
## setRtNsd(synapterTiny, 2.5)

## (6) Matching ident peptides and quant EMRTs
findEMRTs(synapterTiny)
plotEMRTtable(synapterTiny)
getEMRTtable(synapterTiny)
performance(synapterTiny)
performance2(synapterTiny)

## (7) Exporting data to csv spreadsheets
writeMergedPeptides(synapterTiny, what = "light") ## or what="full"
writeMergedPeptides(synapterTiny, file = "myresults.csv", what="light")
writeMatchedEMRTs(synapterTiny, what = "light")   ## or what="full"
writeMatchedEMRTs(synapterTiny, file = "myresults2.csv", what="light")
## These will export the filter peptide data
writeIdentPeptides(synapterTiny, file = "myIdentPeptides.csv")
writeQuantPeptides(synapterTiny, file = "myQuantPeptides.csv")
## If used right after loading, the non-filted data will be exported

Run the code above in your browser using DataLab