readAlphaPeptFile: Read (Normalized) Quantitation Data Files Produced By AlphaPept

Description

Protein quantification results from AlphaPept can be read using this function. Input files compressed as .gz can be read as well. The protein abundance values (XIC) get extracted. Since protein annotation is not very extensive with this format of data, the function allows reading the initial fasta files (from the directory above the quantitation-results) allowing to extract more protein-annotation (like species). Sample-annotation (if available) can be extracted from sdrf files, too. The protein abundance values may be normalized using multiple methods (median normalization as default), the determination of normalization factors can be restricted to specific proteins (normalization to bait protein(s), or to invariable matrix of spike-in experiments). The protein annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of protein abundance values may be generated before and after normalization.

Usage

readAlphaPeptFile(
  fileName = "results_proteins.csv",
  path = NULL,
  fasta = NULL,
  isLog2 = FALSE,
  normalizeMeth = "none",
  quantCol = "_LFQ$",
  contamCol = NULL,
  read0asNA = TRUE,
  refLi = NULL,
  sampleNames = NULL,
  specPref = NULL,
  extrColNames = NULL,
  remRev = TRUE,
  remConta = FALSE,
  separateAnnot = TRUE,
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  titGraph = NULL,
  wex = 1.6,
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfRazorPeptides',

$quantNotes, $notes and optional setup for meta-data from sdrf; or a data.frame with quantitation and annotation if separateAnnot=FALSE

Arguments

fileName: (character) name of file to be read (default 'results_proteins.csv'). Gz-compressed files can be read, too.
path: (character) path of file to be read
fasta: (logical or character) if TRUE the (first) fasta from one direcory higher than fileName will be read as fasta-file to extract further protein annotation; if character a fasta-file at this location will be read/used/
isLog2: (logical) typically data read from AlphaPept are expected NOT to be isLog2=TRUE
normalizeMeth: (character) normalization method, defaults to median, for more details see normalizeThis)
quantCol: (character or integer) exact col-names, or if length=1 content of quantCol will be used as pattern to search among column-names for $quant using grep
contamCol: (character or integer, length=1) which columns should be used for contaminants
read0asNA: (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results)
refLi: (character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
sampleNames: (character) custom column-names for quantification data; this argument has priority over suplAnnotFile
specPref: (character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species
extrColNames: (character or NULL) custom definition of col-names to extract
remRev: (logical) option to remove all protein-identifications based on reverse-peptides
remConta: (logical) option to remove all proteins identified as contaminants
separateAnnot: (logical) if TRUE output will be organized as list with $annot, $abund for initial/raw abundance values and $quant with final normalized quantitations
gr: (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from sdrf and/or suplAnnotFile (if provided)
sdrf: (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange, the second & third elements may give futher indicatations for automatic organization of groups of replicates. Besides, the output from readSdrf or a list from defineSamples may be provided; if gr is provided, gr gets priority for grouping of replicates; if sdrfOrder=TRUE the output will be put in order of sdrf
suplAnnotFile: (logical or character) optional reading of supplemental files produced by Compomics; if gr is provided, it gets priority for grouping of replicates if TRUE default to files 'summary.txt' (needed to match information of sdrf) and 'parameters.txt' which can be found in the same folder as the main quantitation results; if character the respective file-names (relative ro absolute path), 1st is expected to correspond to 'summary.txt' (tabulated text, the samples as given to Compomics) and 2nd to 'parameters.txt' (tabulated text, all parameters given to Compomics)
groupPref: (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to readSampleMetaData. May contain lowNumberOfGroups=FALSE for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group) May contain chUnit (logical or character) to be passed to readSampleMetaData() for (optional) adjustig of unit-prefixes in meta-data group labels, in case multiple different unit-prefixes are used (eg '100pMol' and '1nMol').
titGraph: (character) custom title to plot of distribution of quantitation values
wex: (numeric) relative expansion factor of the violin in plot
plotGraph: (logical) optional plot vioplot of initial and normalized data (using normalizeMeth); alternatively the argument may contain numeric details that will be passed to layout when plotting
silent: (logical) suppress messages
debug: (logical) additional messages for debugging
callFrom: (character) allow easier tracking of messages produced

Details

Meta-data describing the samples and experimental setup may be available from a sdrf-file (from the directory above the analysis/quantiication results) If available, the meta-data will be examined for determining groups of replicates and the results thereof can be found in $sampleSetup$levels. Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf) may be given, too.

This import-function has been developed using AlphaPept version x.x. The final output is a list containing these elements: $raw, $quant, $annot, $counts, $sampleSetup, $quantNotes, $notes, or (if separateAnnot=FALSE) data.frame with annotation- and main quantification-content. If sdrf information has been found, an add-tional list-element setup will be added containg the entire meta-data as setup$meta and the suggested organization as setup$lev.

Examples

Run this code

path1 <- system.file("extdata", package="wrProteo")
# Here we'll load a short/trimmed example file
fiNaAP <- "tinyAlpaPeptide.csv.gz"
dataAP <- readAlphaPeptFile(file=fiNaAP, path=path1, tit="tiny AlphaPaptide ")
summary(dataAP$quant)