Learn R Programming

iq (version 1.10.1)

fast_read: Reading data from an input file

Description

A highly efficient reading of a tab-separated text file for iq processing.

Usage

fast_read(filename,
          sample_id = "R.Condition",
          primary_id = "PG.ProteinGroups",
          secondary_id = c("EG.ModifiedSequence", "FG.Charge", "F.FrgIon", "F.Charge"),
          intensity_col = "F.PeakArea",
          annotation_col = c("PG.Genes", "PG.ProteinNames"),
          filter_string_equal = c("F.ExcludedFromQuantification" = "False"),
          filter_string_not_equal = NULL,
          filter_double_less = c("PG.Qvalue" = "0.01", "EG.Qvalue" = "0.01"),
          filter_double_greater = NULL,
          intensity_col_sep = NULL,
          intensity_col_id = NULL,
          na_string = "0")

Value

A list is returned with following components

protein

A table of proteins in the first column followed by annotation columns.

sample

A vector of samples.

ion

A vector of fragment ions to be used for quantification.

quant_table

A list of four components: protein_list (index pointing to protein)), sample_list (index pointing to sample), id (index pointing to ion), and quant (intensities).

Arguments

filename

A long-format tab-separated text file with a primary column of protein identification, secondary columns of fragment ions, a column of sample names, a column for quantitative intensities, and extra columns for annotation.

primary_id

Unique values in this column form the list of proteins to be quantified.

secondary_id

A concatenation of these columns determines the fragment ions used for quantification.

sample_id

Unique values in this column form the list of samples.

intensity_col

The column for intensities.

annotation_col

Annotation columns

filter_string_equal

A named vector of strings. Only rows satisfying the condition are kept.

filter_string_not_equal

A named vector of strings. Only rows satisfying the condition are kept.

filter_double_less

A named vector of strings. Only rows satisfying the condition are kept. Default PG.Qvalue < 0.01 and EG.Qvalue < 0.01.

filter_double_greater

A named vector of strings. Only rows satisfying the condition are kept.

intensity_col_sep

A separator character when entries in the intensity column contain multiple values.

intensity_col_id

The column for identities of multiple quantitative values.

na_string

The value considered as NA.

Author

Thang V. Pham

Details

When entries in the intensity column contain multiple values, this function will replicate entries in other column and the secondary_id will be appended with corresponding entries in intensity_col_id when it is provided. Otherwise, integer values 1, 2, 3, etc... will be used.

References

Pham TV, Henneman AA, Jimenez CR. iq: an R package to estimate relative protein abundances from ion quantification in DIA-MS-based proteomics. Bioinformatics 2020 Apr 15;36(8):2611-2613.