readDiaNNFile: Read Tabulated Files Exported by DIA-NN At Protein Level

Description

This function allows importing protein identification and quantification results from DIA-NN. Data should be exported as tabulated text (tsv) as protein-groups (pg) to allow import by thus function. Quantification data and other relevant information will be parsed and extracted (similar to the other import-functions from this package). The final output is a list containing as (main) elements: $annot, $raw and $quant, or a data.frame with the quantication data and a part of the annotation if argument separateAnnot=FALSE.

Usage

readDiaNNFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "\\.raw$",
  annotCol = NULL,
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, chUnit = TRUE),
  plotGraph = TRUE,
  titGraph = "DiaNN",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes

and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only

Arguments

fileName: (character) name of file to be read
path: (character) path of file to be read
normalizeMeth: (character) normalization method, defaults to median, for more details see normalizeThis)
sampleNames: (character) custom column-names for quantification data; this argument has priority over suplAnnotFile
read0asNA: (logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results)
quantCol: (character or integer) exact col-names, or if length=1 content of quantCol will be used as pattern to search among column-names for $quant using grep
annotCol: (character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") )
refLi: (character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
separateAnnot: (logical) if TRUE output will be organized as list with $annot, $abund for initial/raw abundance values and $quant with final log2 (normalized) quantitations
FDRCol: - not used (the argument was kept to remain with the same synthax as the other import functions fo this package)
groupPref: (list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to readSampleMetaData. May contain lowNumberOfGroups=FALSE for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group) May contain chUnit (logical or character) to be passed to readSampleMetaData() for (optional) adjustig of unit-prefixes in meta-data group labels, in case multiple different unit-prefixes are used (eg '100pMol' and '1nMol').
plotGraph: (logical or integer) optional plot of type vioplot of initial and normalized data (using normalizeMeth); if integer, it will be passed to layout when plotting
titGraph: (character) custom title to plot of distribution of quantitation values
wex: (integer) relative expansion factor of the violin-plot (will be passed to vioplotW)
specPref: (character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe', and optional following ones for supplemental tags/species - maked as 'species2','species3',...); if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument annotCol)
gr: (character or factor) custom defined pattern of replicate association, will override final grouping of replicates from sdrf and/or suplAnnotFile (if provided)
sdrf: (character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange, the second element may give futher indicatations for automatic organization of groups of replicates. Besides, the output from readSdrf or a list from defineSamples may be provided; if gr is provided, gr gets priority for grouping of replicates
suplAnnotFile: (logical or character) optional reading of supplemental files; however, if gr is provided, gr gets priority for grouping of replicates; if character the respective file-name (relative or absolute path)
silent: (logical) suppress messages
debug: (logical) additional messages for debugging
callFrom: (character) allow easier tracking of messages produced

Details

This function has been developed using DIA-NN version 1.8.x. Note, reading gene-group (gg) files is in priciple possible, but resulting files typically lack protein-identifiers which may be less convenient in later steps of analysis. Thus, it is suggested to rather read protein-group (pg) files.

Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment related information.

Examples

Run this code

diaNNFi1 <- "tinyDiaNN1.tsv.gz"   
## This file contains much less identifications than one may usually obtain
path1 <- system.file("extdata", package="wrProteo")
## let's define the main species and allow tagging some contaminants
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="HUMAN")
dataNN <- readDiaNNFile(path1, file=diaNNFi1, specPref=specPref1, tit="Tiny DIA-NN Data")
summary(dataNN$quant)