binMS
attempts to recover these underlying compounds
through a binning procedure, described in more detail in Details
.
binMS(mass_spec, mtoz, charge, mass = NULL, time_peak_reten, ms_inten = NULL, time_range, mass_range, charge_range, mtoz_diff, time_diff)
matrix
or data.frame
. This object
must contain mass spectrometry abundances, and may optionally contain
mass-to-charge values, charge state information, or additional extraneous
variables. The mass spectrometry data is expected to be in a form with
each column corresponding to a variable and each row corresponding to a
mass-to-charge level. For example, suppose that a collection of mass spectrometry intensity
observations has provided data for 50 fractions across 20,000
mass-to-charge values. Then the input for mass_spec
should be a
matrix
or data.frame
with 20,000 rows and 50 or more
columns. The additional columns beyond the 50 containing the mass
spectrometry intensities can be the mass-to-charge data, the charge data,
or other extraneous variables (the extraneous variables will be discarded
when constructing the msDat
object).
One way to provide the information is to provide a numeric vector where
each entry provides the mass-to-charge value for a corresponding row of
mass spectrometry data. Then the k
-th entry of the vector would
provide the mass-to-charge value for the k
-th row of the mass
spectrometry data.
A second way is to provide a single number which specifies the column
index in the matrix
or data.frame
provided as the argument
for the mass_spec
parameter, such that this column contains the
mass-to-charge information.
A third way is provide a single character string which provides the
column name in the matrix
or data.frame
provided as the
argument for the mass_spec
parameter, such that this column
contains the mass-to-charge information. Partial matching is supported.
charge
parameter can be provided
in the same manner as for the mass-to-charge values.NULL
. If however the
information for mass is already included in the dataset in hand, then
providing it to the function will be slightly more efficient then
re-performing the calculations. The information for the charge
parameter can be provided in the same manner as for the mass-to-charge
values.time_peak_reten
parameter can be provided in the same manner as for the mass-to-charge
and other information; this paramater specifies the time at which the
peak retention level of the compound was achieved.NULL
or a vector either of mode character or
mode numeric specifying which of the variables in the argument to
mass_spec
are to be retained as the mass spectrometry intensity
data. If NULL
, then it is taken to mean that the entirety of the
data in mass_spec
, after removing variables in the data that are
specified as arguments, is the mass spectrometry intensity data. If it
is a numeric vector, then the entries should provide the indices for the
region of interest in the mass spectrometry data in the argument for
msObj
. If it is a character vector, then the entries should
uniquely specify the region of interest through partial string matching.binMS
which inherits from
msDat
. This object is a list
with elements described
below. The class is equipped with a print
, summary
, and
extractMS
function.The first step is as follows. All observations must satisfy each of the following criteria for inclusion in the binning process.
time_range
mass_range
charge_range
Once that a set of observations satisfying the above criteria is obtained, then a second step attempts to combine observations believed to belong to the same underlying compound. The algorithm considers two observations that satisfy each of the following criteria to belong to the same compound.
mtoz_diff
time_pr_diff
Then the binning algorithm is defined as follows. Consider an observation that satisfies the inclusion criteria; this observation is compaired pairwise with every other observation that satisfies the inclusion criteria. If a pair of observations satisfies the criteria determining them to belong to the same underlying compound then the two observations are merged into a single observation. The two previous compounds are removed from the working set, and the process starts over with the newly created observation. The process repeats until no other observation in the working set meets the criteria determining it to belong to the same underlying compound as that of the current observation; at this point it is considered that all observations belonging to the compound have been found, and the process starts over with a new observation.
The merging process has not yet been defined; it is performed by averaging the mass-to-charge values and peak elution times, and summing the mass spectrometry intensities at each fraction. Although observations are merged pairwise, when multiple observations are combined in a sequence of pairings, the averages are given equal weight for all of the observations. In other words, if a pair of observations are merged, and then a third observation is merged with the new observation created by combining the original two, then the mass-to-charge value and peak elution time values of the new observation are obtained by summing the values for each of the three original observations and dividing by three. The merging process for more than three observations is conducted similarly.
Having described the binning algorithm, it is apparent that there are scenarios in which the order in which observations are merged affects the outcome of the algorithm. Since it seems that a minumum requirement of any binning algorithm is that the algorithm is invariant to the ordering of the observations in the data, this algorithm abides by the following rules. The observations in the data are sorted in increasing order by mass-to-charge value, peak elution time, and electical charge state, respectively. Then when choosing an observation to compare to the rest of the set, we start with the observation at the top of the sort ordering, and compare it one-at-a-time to the other elements in the set according to the same ordering. When a consolidated observation is complete in that no other observation left in the working set satisfies the merging criteria, then this consolidated observation can be removed from consideration for all future merges.
# Load mass spectrometry data
data(mass_spec)
# Perform consolidation via binMS
bin_out <- binMS(mass_spec = mass_spec,
mtoz = "m/z",
charge = "Charge",
mass = "Mass",
time_peak_reten = "Reten",
ms_inten = NULL,
time_range = c(14, 45),
mass_range = c(2000, 15000),
charge_range = c(2, 10),
mtoz_diff = 0.05,
time_diff = 60)
# print, summary function
bin_out
summary(bin_out)
# Extract consolidated mass spectrometry data as a matrix or msDat object
bin_matr <- extractMS(msObj = bin_out, type = "matrix")
bin_msDat <- extractMS(msObj = bin_out, type = "matrix")
Run the code above in your browser using DataLab