Learn R Programming

ume (version 1.5.2)

remove_blanks: Remove molecular formulas detected in blanks

Description

Remove all molecular formulas that were detected in one or more blank analyses (identified via blank_file_ids). Matching is always on mf. If a retention-time column is present (or provided using ret_time_col), removal is restricted to the corresponding LC segment.

Usage

remove_blanks(
  mfd,
  blank_file_ids = NULL,
  blank_prevalence = 0.5,
  ret_time_col = NULL,
  verbose = FALSE,
  ...
)

Value

data.table; subset of the original molecular formula table (mfd) with blank formulas removed (globally or LC-segment-wise).

Arguments

mfd

data.table with molecular formula data as derived from ume::assign_formulas. Column names of elements/isotopes must match names in the isotope column of ume::masses; values are integers representing counts per formula.

blank_file_ids

Integer vector of file_id values that represent blank analyses.

blank_prevalence

Numeric between 0 and 1. Threshold for blank filtering: the proportion of blanks in which a molecular formula must occur before it is excluded from the sample data. For example, blank_prevalence = 0 (default) removes any formula detected in at least one blank, while blank_prevalence = 0.5 removes formulas detected in 50% or more of the blanks.

ret_time_col

Character scalar. Name of the retention-time column that contains the beginning of the retention time segment that corresponds to the mass spectrum. If NULL (default), the function will auto-detect the first column in c("ret_time_min","retention_time","rt","RT") that exists in mfd. If none is found, blanks are removed ignoring retention time.

verbose

logical; if TRUE, show progress messages.

...

Additional arguments passed to methods.

Backward compatibility

The argument LCMS is deprecated and no longer used. Retention-time-aware removal is now enabled automatically when a retention-time column is present or explicitly provided via ret_time_col.

Author

Boris P. Koch

Details

  • Requires a unique integer file_id per analysis in mfd.

  • Minimal required columns in mfd: mf, file_id.

  • Optional column: a retention-time column (e.g. "ret_time_min").

  • If a retention-time column is used, formulas present in blanks are only removed for rows whose mf and retention time match

  • The input mfd is not modified by reference; a subset is returned.

See Also

Other Formula subsetting: filter_int(), filter_mass_accuracy(), filter_mf_data(), subset_known_mf(), ume_assign_formulas(), ume_filter_formulas()

Examples

Run this code
# Presence/absence removal, no retention time:
remove_blanks(mfd = mf_data_demo,
              remove_blank_list = "Blank",
              verbose = TRUE)

Run the code above in your browser using DataLab