preprocess: Data preprocessing for protein quantification

Description

Prepares a long-format input including removing low-intensity ions and performing median normalization.

Usage

preprocess(quant_table,
           primary_id = "PG.ProteinGroups",
           secondary_id = c("EG.ModifiedSequence", "FG.Charge", "F.FrgIon", "F.Charge"),
           sample_id = "R.Condition",
           intensity_col = "F.PeakArea",
           median_normalization = TRUE,
           log2_intensity_cutoff = 0,
           pdf_out = "qc-plots.pdf",
           pdf_width = 12,
           pdf_height = 8,
           intensity_col_sep = NULL,
           intensity_col_id = NULL,
           na_string = "0",
           show_boxplot = TRUE)

Value

A data frame is returned with following components

protein_list: A vector of proteins.
sample_list: A vector of samples.
id: A vector of fragment ions to be used for quantification.
quant: A vector of log2 intensities.

Arguments

quant_table: A long-format table with a primary column of protein identification, secondary columns of fragment ions, a column of sample names, and a column for quantitative intensities.
primary_id: Unique values in this column form the list of proteins to be quantified.
secondary_id: A concatenation of these columns determines the fragment ions used for quantification.
sample_id: Unique values in this column form the list of samples.
intensity_col: The column for intensities.
median_normalization: A logical value. The default TRUE value is to perform median normalization.
log2_intensity_cutoff: Entries lower than this value in log2 space are ignored. Plot a histogram of all intensities to set this parameter.
pdf_out: A character string specifying the name of the PDF output. A NULL value will suppress the PDF output.
pdf_width: Width of the pdf output in inches.
pdf_height: Height of the pdf output in inches.
intensity_col_sep: A separator character when entries in the intensity column contain multiple values.
intensity_col_id: The column for identities of multiple quantitative values.
na_string: The value considered as NA.
show_boxplot: A logical value. The default TRUE value is to create boxplots of fragment intensities for each sample.

Author

Thang V. Pham

Details

When entries in the intensity column contain multiple values, this function will replicate entries in other column and the secondary_id will be appended with corresponding entries in intensity_col_id when it is provided. Otherwise, integer values 1, 2, 3, etc... will be used.

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.

Examples

Run this code

# \donttest{
data("spikeins")
head(spikeins)
# This example set of spike-in proteins has been 'median-normalized'.
norm_data <- iq::preprocess(spikeins, median_normalization = FALSE, pdf_out = NULL)
# }

Run the code above in your browser using DataLab