A convenient function combining multiple steps to process a long format table using the MaxLFQ algorithm.
process_long_format(input_data,
output_filename,
sample_id = "File.Name",
primary_id = "Protein.Group",
secondary_id = "Precursor.Id",
intensity_col = "Fragment.Quant.Corrected",
annotation_col = NULL,
filter_string_equal = NULL,
filter_string_not_equal = NULL,
filter_double_less = c("Q.Value" = "0.01", "PG.Q.Value" = "0.01"),
filter_double_greater = NULL,
intensity_col_sep = ";",
intensity_col_id = NULL,
na_string = "0",
normalization = "median",
log2_intensity_cutoff = 0,
pdf_out = "qc-plots.pdf",
pdf_width = 12,
pdf_height = 8,
show_boxplot = TRUE,
peptide_extractor = NULL,
rfasta = NULL)Either an input data frame is processed with fast_MaxLFQ or an input file is processed with fast_read, fast_preprocess, and fast_MaxLFQ.
Subsequently, the result is written to output_filename. The quantification values are in log2 space.
A NULL value is returned. If peptide_extractor is not NULL, fragment statistics
for each protein will be calculated based on the result of the extractor function. Counting the number of peptides contributing to a protein is possible using an appropriate extractor function. An example value for peptide_extractor is function(x) gsub("[0-9].*$", "", x), which removes the charge state and fragment descriptors in an ion descriptor to obtain unique peptide sequences. One can examine the ion component returned by the fast_read function to derive a regular expression to be used in the gsub function above.
Another example is function(x) gsub("\(UniMod:\d+\)|\d|_", "", x) which removes annotations for modifications on the peptides.
A data frame or a filename. See filename in fast_read.
Output filename.
See sample_id in fast_read.
See primary_id in fast_read.
See secondary_id in fast_read.
See intensity_col in fast_read.
See annotation_col in fast_read.
See filter_string_equal in fast_read.
See filter_string_not_equal in fast_read.
See filter_double_less in fast_read.
See filter_double_greater in fast_read.
See intensity_col_sep in fast_read.
See intensity_col_id in fast_read.
See intensity_col_id in fast_read.
Normalization type. Possible values are median and none. The default value median is for median normalization in fast_preprocess.
See log2_intensity_cutoff in fast_preprocess.
See pdf_out in fast_preprocess.
See pdf_width in fast_preprocess.
See pdf_height in fast_preprocess.
See show_boxplot in fast_preprocess.
A function to parse peptides
If available, calculates the sequence coverage.
Thang V. Pham
Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.
fast_read, fast_preprocess, fast_MaxLFQ