saint_permF: Pre- and Postprocessing for AP-MS data analysis using SAINT

Description

A complete workflow for the identification of true interaction proteins based on AP-MS data, embedding the scoring method SAINT into a pre- and postprocessing framework.

Usage

saint_permF(file_baittable, file_inttable, prottable, 
   norm = c("none", "sumtotal", "upperquartile", "DESeq", 
            "TMM", "quantile"), 
   Filter = TRUE, 
   filter.method = c("IQR", "overallVar", "noVar"), 
   var.cutoff = NA, limit = 0, intern.norm = FALSE, 
   saint.options = "2000 10000 0 1 0")

Arguments

file_baittable

a character string specifying the pathname of the baittable. see Details.

file_inttable

a character string specifying the pathname of the interaction table. see Details.

prottable

a character string specifying the pathname of the protein table. see Details.

norm

method to normalize the data. If norm="none", no normalization of the data is performed.

Filter

logical value, whether filtering of the data is applied (Default TRUE).

filter.method

method to use for filtering, must be one of "IQR", "overallVar" or "noVar", only used when Filter=TRUE.

var.cutoff

percentile (between 0 and 1) or NA. Cutoff for filtering the data, defined by a quantile or shortest-interval (=NA, Default), only used when Filter=TRUE.

limit

minimal number of expected true interaction proteins in the data.

intern.norm

logical value. If TRUE, normalization is repeated on the filtered data (Default FALSE).

saint.options

parameters set for SAINT.

Value

The overall result is reported in the file WY_Result.csv: It is based on the original Saint output unique_interactions, but additionally Westfall&Young adjusted p-values are assigned to each interaction candidate. These p-values control the FWER, allowing to estimate the portion of false-positive interactions.
Different .txt and .xls files are generated, enabling the user to follow the different intermediate results:
1. In case of normalization: normalized count data in form of the interaction table (txt file), named after the normalization method and the bait protein (e.g. quantile_bait_IntSaint.txt).
2. In case of filtering: the filtered (and normalized) interaction table (Inttable_filtered.txt).
3. The Saint output:unique_interactions, reporting the interaction candidates with SAINT scores, calculated on normalized data (file name ending_orig), and filtered: (file name ending_orgF).
4. Permutation data: scores calculated for each permutation data set (permutation matrix asperm.avgp.Rata,perm.maxp.Rdata).

Details

The input files correspond to the input formats used by SAINT: the baittable, prey- and interaction table in the form of tab-delimited files. The baittable consists of three columns: IP name, bait or control name, indicator for bait and control experiment (T=bait purification, C=control). The interaction table consists of four columns: IP name, bait or control name, protein name, spectral count (note: a protein which was not detected in one of the samples receives a zero count). The protein table refers to the preyfile, it consists of three columns: protein names, protein length, protein names or associated gene names (if available). A more detailed description on the generation of these files is given in Choi et.al. (Current Protocols in Bioinformatics 2012).

Pre-processing comprises normalization and filtering of the data: Here, it can be chosen from five different normalization methods, adapted from microarray and RNA-seq analysis to AP-MS data. For further details see norm.inttable. The filter consists of a biological filter and a statistical variance filter and aims to remove obvious contaminants from further analysis. If filter.method="noVar", only the biological filter is conducted. Both are conducted, if filter.method="IQR", here the variance is calculated by the inter-quartile-range, or if filter.method="overallVar", here the variance is calculated across all samples. The var.cutoff defines the fraction of proteins with the lowest overall variance, which are considered as contaminants and are removed. var.cutoff=NA refers to a cutoff defined by the mean of the shortest intervall containing 50% of the data (default). Alternatively, a quantile can be set as cutoff, e.g. a cutoff of 0.5 filters 50% of the data showing the smallest overall variance or IQR. see also varFilter The parameter limit assures, that filtering results in a number of proteins above the number of expected true interaction proteins.

The corresponding parameters in SAINT [nburn][niter][lowMode][minFold] [normalize] are set as recommended by SAINT. Further details on the parameter setting can be found in Choi et.al.(Current Protocols in Bioinformatics 2012).

References

Choi H, Larsen B, Lin Z-Y, et al. SAINT: probabilistic scoring of affinity purification-mass spectrometry data. Nature Methods 2011.

Choi H, Liu G, Mellacheruvu D, et al. Analyzing Protein-Protein Interactions from Affinity Purification-Mass Spectrometry Data with SAINT. Current Protocols in Bioinformatics 2012. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biology 2010.

Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 2010.

Bolstad BM, Irizarry RA, Astrand M, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003.

Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for p-value adjustment. 1993.

Bourgon R, Gentleman R, Huber W. Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences 2010.

Examples

Run this code

#input dara
baitfile <- system.file("extdata", "baittab.txt", package="apmsWAPP")
intfile <- system.file("extdata", "inttable.txt", package="apmsWAPP")
protfile <- system.file("extdata", "prottable.txt", package="apmsWAPP")
 
# To run this example, a linux environment is required and SAINT needs 
# to be installed!
# Important: Define a working directory for storage of the resulting 
# files
# Pre-processing: quantile normalization and filtering 
# Workflow call:
# saint_permF(baitfile,intfile,protfile, norm="quantile", Filter=TRUE, 
#       filter.method="overallVar", var.cutoff=0.3, intern.norm=FALSE)

Run the code above in your browser using DataLab