process_wide_format: Merging rows with identical values in a particular column in a table

Description

Collapses rows with identical values in a particular column in a table. When the values in each row are proportional such as intensities of multiple fragments of a protein, the MaxLFQ algorithm is recommended.

Usage

process_wide_format(input_filename,
                    output_filename,
                    id_column,
                    quant_columns,
                    data_in_log_space = FALSE,
                    annotation_columns = NULL,
                    method = "maxLFQ")

Value

The result table is written to output_filename. A NULL value is returned.

Arguments

input_filename: Input filename of a tab-separated value text file.
output_filename: Output filename.
id_column: The column where unique values will be kept. Rows with identical values in this column are merged. Rows with empty values here are removed.
quant_columns: Columns containing numerical data to be merged.
data_in_log_space: A logical value. If FALSE, the numerical data will be log2-transformed.
annotation_columns: Columns in the input file apart from id_column and quant_columns that will be kept in the output.
method: Method for merging. Default value is "maxLFQ". Possible values are "maxLFQ", "maxLFQ_R", "median_polish", "top3", "top5", "meanInt", "maxInt", "sum", "least_na" and any function for collapsing a numerical matrix to a row vector.

Author

Thang V. Pham

Details

Method "maxLFQ_R" implements the MaxLFQ algorithm pure R. It is slower than "maxLFQ".

Method "maxInt" selects row with maximum intensity (top 1).

Method "sum" sum all intensities.

Method "least_na" selects row with the least number of missing values.

The value of method can be a function such as function(x) log2(colSums(2^x, na.rm = TRUE)) for summing all intensities in the original space.

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.