process_long_format: Long format to a wide format table using the MaxLFQ algorithm

Description

A convenient function combining multiple steps to process a long format table using the MaxLFQ algorithm.

Usage

process_long_format(input_data,
                    output_filename,
                    sample_id = "File.Name",
                    primary_id = "Protein.Group",
                    secondary_id = "Precursor.Id",
                    intensity_col = "Fragment.Quant.Corrected",
                    annotation_col = NULL,
                    filter_string_equal = NULL,
                    filter_string_not_equal = NULL,
                    filter_double_less = c("Q.Value" = "0.01", "PG.Q.Value" = "0.01"),
                    filter_double_greater = NULL,
                    intensity_col_sep = ";",
                    intensity_col_id = NULL,
                    na_string = "0",
                    normalization = "median",
                    log2_intensity_cutoff = 0,
                    pdf_out = "qc-plots.pdf",
                    pdf_width = 12,
                    pdf_height = 8,
                    show_boxplot = TRUE,
                    peptide_extractor = NULL,
                    rfasta = NULL)

Value

Either an input data frame is processed with fast_MaxLFQ or an input file is processed with fast_read, fast_preprocess, and fast_MaxLFQ. Subsequently, the result is written to output_filename. The quantification values are in log2 space. A NULL value is returned. If peptide_extractor is not NULL, fragment statistics for each protein will be calculated based on the result of the extractor function. Counting the number of peptides contributing to a protein is possible using an appropriate extractor function. An example value for peptide_extractor is function(x) gsub("[0-9].*$", "", x), which removes the charge state and fragment descriptors in an ion descriptor to obtain unique peptide sequences. One can examine the ion component returned by the fast_read function to derive a regular expression to be used in the gsub function above. Another example is function(x) gsub("$UniMod:\d+$|\d|_", "", x) which removes annotations for modifications on the peptides.

Arguments

input_data: A data frame or a filename. See filename in fast_read.
output_filename: Output filename.
sample_id: See sample_id in fast_read.
primary_id: See primary_id in fast_read.
secondary_id: See secondary_id in fast_read.
intensity_col: See intensity_col in fast_read.
annotation_col: See annotation_col in fast_read.
filter_string_equal: See filter_string_equal in fast_read.
filter_string_not_equal: See filter_string_not_equal in fast_read.
filter_double_less: See filter_double_less in fast_read.
filter_double_greater: See filter_double_greater in fast_read.
intensity_col_sep: See intensity_col_sep in fast_read.
intensity_col_id: See intensity_col_id in fast_read.
na_string: See intensity_col_id in fast_read.
normalization: Normalization type. Possible values are median and none. The default value median is for median normalization in fast_preprocess.
log2_intensity_cutoff: See log2_intensity_cutoff in fast_preprocess.
pdf_out: See pdf_out in fast_preprocess.
pdf_width: See pdf_width in fast_preprocess.
pdf_height: See pdf_height in fast_preprocess.
show_boxplot: See show_boxplot in fast_preprocess.
peptide_extractor: A function to parse peptides
rfasta: If available, calculates the sequence coverage.

Author

Thang V. Pham

References