Protein quantification results form Thermo ProteomeDiscoverer
which were exported as tabulated text can be imported and relevant information extracted.
The final output is a list containing 3 elements: $annot
, $raw
and optional $quant
, or returns data.frame with entire content of file if separateAnnot=FALSE
.
readProtDiscovFile(
fileName,
path = NULL,
normalizeMeth = "median",
sampleNames = NULL,
infoFile = TRUE,
read0asNA = TRUE,
quantCol = "^Abundances*",
annotCol = NULL,
contamCol = "Contaminant",
refLi = NULL,
separateAnnot = TRUE,
FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
plotGraph = TRUE,
tit = "Proteome Discoverer",
graphTit = NULL,
wex = 1.6,
specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
(character) name of file to be read
(character) path of file to be read
(character) normalization method (will be sent to normalizeThis
)
(character) new column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over infoFile
(character or logical) filename containing additional information about MS-samples (produced by ProteomeDiscoverer default '*.InputFiles.txt'),
if TRUE
the first file in path
containing the default name will be used. If no specific sampleNames
given, the filenames will will be trimmed to remove redundant text and used as sampleNames
.
Besides, ProteomeDiscoverer version number and full raw-file path will be extracted for $notes in fial output.
(logical) decide if initial quntifications at 0 should be transformed to NA
(character or integer) exact col-names, or if length=1 content of quantCol
will be used as pattern to search among column-names for $quant using grep
(character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") )
(character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer.
If a column named contamCol
is found, the data will be lateron filtered to remove all contaminants, set to NULL
for keeping all contaminants
(character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
(logical) if TRUE
output will be organized as list with $annot
, $abund
for initial/raw abundance values and $quant
with final normalized quantitations
(list) optional indication to search for protein FDR information
(logical or integer) optional plot of type vioplot of initial and normalized data (using normalizeMeth
); if integer, it will be passed to layout
when plotting
(character) custom title to plot
(character) depreciated custom title to plot, please use 'tit'
(integer) relative expansion factor of the violin-plot (will be passed to vioplotW
)
(character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe',
and optional following ones for supplemental tags/species - maked as 'species2','species3',...);
if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument annotCol
)
(logical) suppress messages
(logical) additional messages for debugging
(character) allow easier tracking of messages produced
list with $raw
(initial/raw abundance values), $quant
with final normalized quantitations, $annot
, $counts
an array with number of peptides, $quantNotes
and $notes
; or if separateAnnot=FALSE
the function returns a data.frame with annotation and quantitation only
This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5.
The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export.
Using the argument infoFile
it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information.
If a column named contamCol
is found, the data will be lateron filtered to remove all contaminants, set to NULL
for keeping all contaminants
This function replaces the depreciated function readPDExport
.
read.table
, normalizeThis
) , readMaxQuantFile
, readProlineFile
# NOT RUN {
path1 <- system.file("extdata", package="wrProteo")
fiNa <- "tinyPD_allProteins.txt.gz"
dataPD <- readProtDiscovFile(file=fiNa, path=path1)
summary(dataPD$quant)
# }
Run the code above in your browser using DataLab