readProlineFile: Read xlsx, csv or tsv files exported from Proline and MS-Angel

Description

Quantification results from Proline Proline and MS-Angel exported as xlsx format can be read directly using this function. Besides, files in tsv, csv (European and US format) or tabulated txt can be read, too. Then relevant information gets extracted, the data can optionally normalized and displayed as boxplot or vioplot. The final output is a list containing 6 elements: $raw, $quant, $annot, $counts, $quantNotes and $notes. Alternatively, a data.frame with annotation and quantitation data may be returned if separateAnnot=FALSE. Note: There is no normalization by default since quite frequently data produced by Proline are already sufficiently normalized. The figure produced using the argument plotGraph=TRUE may help judging if the data appear sufficiently normalized (distribtions should align).

Usage

readProlineFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  logConvert = TRUE,
  sampleNames = NULL,
  quantCol = "^abundance_",
  annotCol = c("accession", "description", "is_validated", "protein_set_score",
    "X.peptides", "X.specific_peptides"),
  remStrainNo = TRUE,
  pepCountCol = c("^psm_count_", "^peptides_count_"),
  trimColnames = FALSE,
  refLi = NULL,
  separateAnnot = TRUE,
  plotGraph = TRUE,
  titGraph = NULL,
  wex = 2,
  specPref = c(conta = "_conta\\|", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = TRUE,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)

Value

This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfPeptides', $quantNotes and $notes; or a data.frame with quantitation and annotation if separateAnnot=FALSE

Arguments

fileName: (character) name of file to read; .xlsx-, .csv-, .txt- and .tsv can be read (csv, txt and tsv may be gz-compressed). Reading xlsx requires package 'readxl'.
path: (character) optional path (note: Windows backslash sould be protected or written as '/')
normalizeMeth: (character) normalization method (for details and options see normalizeThis)
logConvert: (logical) convert numeric data as log2, will be placed in $quant
sampleNames: (character) custom column-names for quantification data; this argument has priority over suplAnnotFile
quantCol: (character or integer) colums with main quantitation-data : precise colnames to extract, or if length=1 content of quantCol will be used as pattern to search among column-names for $quant using grep
annotCol: (character) precise colnames or if length=1 pattern to search among column-names for $annot
remStrainNo: (logical) if TRUE, the organism annotation will be trimmed to uppercaseWord+space+lowercaseWord (eg Homo sapiens)
pepCountCol: (character) pattern to search among column-names for count data of PSM and NoOfPeptides
trimColnames: (logical) optional trimming of column-names of any redundant characters from beginning and end
refLi: (integer) custom decide which line of data is main species, if single character entry it will be used to choose a group of species (eg 'mainSpe')
separateAnnot: (logical) separate annotation form numeric data (quantCol and annotCol must be defined)
plotGraph: (logical or matrix of integer) optional plot vioplot of initial data; if integer, it will be passed to layout when plotting
titGraph: (character) custom title to plot of distribution of quantitation values
wex: (integer) relative expansion factor of the violin-plot (will be passed to vioplotW)
specPref: (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe', and optional following ones for supplemental tags/species - maked as 'species2','species3',...); if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument annotCol)
gr: (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from sdrf and/or suplAnnotFile (if provided)
sdrf: (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange, the second & third elements may give futher indicatations for automatic organization of groups of replicates. Besides, the output from readSdrf or a list from defineSamples may be provided; if gr is provided, gr gets priority for grouping of replicates; if sdrfOrder=TRUE the output will be put in order of sdrf
suplAnnotFile: (logical or character) optional reading of supplemental files produced by quantification software; however, if gr is provided, gr gets priority for grouping of replicates; if TRUE defaults to file '*InputFiles.txt' (needed to match information of sdrf) which can be exported next to main quantitation results; if character the respective file-name (relative or absolute path)
groupPref: (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to readSampleMetaData. May contain lowNumberOfGroups=FALSE for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group) May contain chUnit (logical or character) to be passed to readSampleMetaData() for (optional) adjustig of unit-prefixes in meta-data group labels, in case multiple different unit-prefixes are used (eg '100pMol' and '1nMol').
silent: (logical) suppress messages
callFrom: (character) allow easier tracking of messages produced
debug: (logical) display additional messages for debugging

Details

This function has been developed using Proline version 1.6.1 coupled with MS-Angel 1.6.1. The classical way of using ths function consists in exporting results produced by Proline and MS-Angel as xlsx file. Besides, other formats may be read, too. This includes csv (eg the main sheet/table of ths xlsx exported file saved as csv). WOMBAT represents an effort to automatize quantitative proteomics experiments, using this route data get exported as txt files which can be read, too.

Examples

Run this code

path1 <- system.file("extdata", package="wrProteo")
fiNa <- "exampleProlineABC.csv.gz"
dataABC <- readProlineFile(path=path1, file=fiNa)
summary(dataABC$quant)