Learn R Programming

nontarget (version 1.9)

pattern.search2: Detecting and grouping isotope m/z relations among LC-HRMS centroid peaks, based on quantized reference data

Description

Algorithm for grouping isotope pattern centroids of chemical components by querying quantized simulation data

Usage

pattern.search2(peaklist,quantiz,mztol=2,ppm=TRUE,inttol=0.5,rttol=0.3, use_isotopes=c("13C","37Cl","15N","81Br","34S","18O"),use_charges=c(1,2), use_marker=TRUE,quick=FALSE,isotopes)

Arguments

peaklist
Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as peaklist.
quantiz
Quantized simulation data of feasible centroid-centroid relations as provided by package nontargetData.
mztol
m/z tolerance setting: value by which the m/z of a peak may vary from its expected value. If parameter ppm=TRUE (see below) given in ppm, otherwise, if ppm=FALSE, in absolute m/z [u].
ppm
Should mztol be set in ppm (TRUE) or in absolute m/z (FALSE).
inttol
Intensity tolerance setting = fraction by which peak intensities may vary; e.g., if set to 0.2, a peak with expected intensity 10000 may range in between 8000 and 12000.
rttol
+/- retention time tolerance. Units as given in column 3 of peaklist argument, e.g. [min].
use_isotopes
Restrict query to certain isotopes dominating centroid relations; set to FALSE to use all available isotopes.
use_charges
Vector of signed integers. Restrict query to certain charges z; set to FALSE to use all charge states.
use_marker
Query for marker peaks, FALSE or TRUE?
quick
Continue if query finds first hit? Speeds up, but leaves resulting information on underlying isotopes incomplete.
isotopes
Dataframe of relevant isotopes as provided by package enviPat; used for checking user inputs.

Value

List of type pattern with 12 entries

Warning

Acceptable outcomes strongly depend on appropriate parametrization of the algorithm and using the correct quantiz data set from package nontargetData. Using overly large values for rttol and/or mztol may lead to slow execution.

Details

As alternative to rule-based pattern.search, differences among measured centroids (peaklist) are queried to match those of compressed (=quantized) simulation data within bounds of measurement tolerances and the quantization distortion. Hence, in comparion to pattern.search, this approach accounts for centroid mass shifts induced by peak profile interferences prevalent at even high m/z resolution.

To derive the quantized data, isotope pattern centroids of several million organic molecular formulas from the PubChem database were calculated for various classes of adducts. Molecular formulas were filtered to be unique and only to contain C, H, O, N, Cl, Br, K, Na, S, Si, F, P and/or I. The resulting >250 million centroid pairs from individual patterns were then categorized for their dominant isotopologues, charge and the possible presence of another centroid of higher intensity than that of the pair (=marker peak). Within these categories, data on centroid pair (a) m/z, (b) m/z differences, (c) intensity ratios and (d) marker m/z was quantized by a recursive partitioning procedure. The resulting compressed data representation was extended by nearest neigbour estimates in the above dimensions (a) to (d) to account for queries with molecular formulas possibly not present in the PubChem set. Internally, the quantized simulation data is queried by a tree-like space-partitioning structure for hyperrectangles, while centroids from peaklist are restructured into kd-trees.

See Also

rm.sat peaklist plotisotopes plotdefect combine plotgroup pattern.search

Examples

Run this code

######################################################
# load HRMS centroid list: ###########################
data(peaklist)
# load isotope data ##################################
data(isotopes)
# load quantized simulation data #####################
data(OrbitrapXL_VelosPro_R60000at400_q)
######################################################
# run isotope pattern grouping #######################
# save the list returned as "pattern" ################
pattern<-pattern.search2(
	peaklist,
	OrbitrapXL_VelosPro_R60000at400_q,
	mztol=2, 
	ppm=TRUE,
	inttol=0.5,
	rttol=0.3,
	use_isotopes=FALSE,
	use_charges=FALSE,
	use_marker=TRUE,
	quick=FALSE,
	isotopes
)
names(pattern);
######################################################

Run the code above in your browser using DataLab