Learn R Programming

MSclassifR (version 0.5.0)

build_X_from_peaks_fast: Build a sample-by-m/z intensity matrix from a list of peaks (fast, C++-backed)

Description

Converts a list of MALDIquant MassPeaks into a numeric matrix X (rows = samples, columns = target m/z), by matching each target m/z to the nearest peak within a tolerance. If requested, per-row max normalization is applied. Spectra that initially produce no matches can be retried with an increased tolerance. Internally uses Rcpp for speed (binary search per m/z).

Usage

build_X_from_peaks_fast(
  peaks,
  moz,
  tolerance = 6,
  normalize = TRUE,
  noMatch = 0,
  bump_if_empty = TRUE,
  toleranceStep = 2,
  max_bumps = 5L
)

Value

A numeric matrix X of dimension n x p:

  • n = length(peaks)

  • p = length(unique(sort(moz))) Column names are the sorted moz coerced to character. Values are intensities (possibly normalized) or noMatch for unmatched positions.

Arguments

peaks

List of MALDIquant::MassPeaks objects (one per sample). Each element must provide pk@mass (numeric vector of m/z) and pk@intensity (numeric vector of intensities) of the same length.

moz

Numeric vector of target m/z values (in Da). Will be sorted and uniqued; the output matrix columns follow this sorted order.

tolerance

Numeric scalar (Da). A target m/z is matched to the nearest peak only if the absolute difference is <= tolerance. Default 6.

normalize

Logical; if TRUE, per-row max normalization is applied after matching (i.e., each sample is divided by its maximum non-NA intensity). Default TRUE.

noMatch

Numeric scalar; intensity value to insert when no peak is matched for a given target m/z. Default 0.

bump_if_empty

Logical; if TRUE, any spectrum resulting in an all-noMatch (or all-zero after normalization) row will be retried by increasing the tolerance in steps of toleranceStep, up to max_bumps attempts. Default TRUE.

toleranceStep

Numeric scalar (Da); the increment used when bumping the tolerance for empty rows. Default 2.

max_bumps

Integer; maximum number of bumps when retrying empty rows. Default 5.

Details

  • Matching: for each target m/z, the nearest peak is chosen if its distance is <= tolerance; otherwise noMatch is used. Ties are resolved by nearest distance via binary search.

  • Normalization: when normalize = TRUE, each row is divided by its maximum non-NA intensity (guarded to avoid division by zero).

  • Empty rows: when bump_if_empty = TRUE, rows with all noMatch (or all zeros after normalization) are retried with increased tolerance (by toleranceStep) up to max_bumps times.

  • Performance: implemented with C++ helpers (map_spectrum_to_moz_cpp and build_X_from_peaks_cpp) for speed on large datasets.

See Also

MALDIquant::createMassPeaks; internal C++ helpers map_spectrum_to_moz_cpp and build_X_from_peaks_cpp (not user-facing).

Examples

Run this code
# Minimal example with synthetic MassPeaks
if (requireNamespace("MALDIquant", quietly = TRUE)) {
  set.seed(1)
  # Two spectra with slightly jittered peaks around 1000, 1500, 2000 Da
  mz1 <- c(999.7, 1500.2, 2000.1); int1 <- c(10, 50, 30)
  mz2 <- c(1000.3, 1499.8, 2000.4); int2 <- c(12, 60, 28)
  p1 <- MALDIquant::createMassPeaks(mass = mz1, intensity = int1)
  p2 <- MALDIquant::createMassPeaks(mass = mz2, intensity = int2)
  peaks <- list(p1, p2)

  # Target m/z grid (unsorted, will be sorted internally)
  moz <- c(2000, 1500, 1000)

  X <- build_X_from_peaks_fast(peaks, moz, tolerance = 1, normalize = TRUE)
  dim(X)
  colnames(X)
  X
}

# Typical usage in a pipeline:
# spectra <- SignalProcessing(yourSpectra)
# peaks   <- MSclassifR::PeakDetection(x = spectra, averageMassSpec = FALSE)
# moz     <- c(1000, 1500, 2000)  # from selection or prior knowledge
# X       <- build_X_from_peaks_fast(peaks, moz, tolerance = 6, normalize = TRUE)
# Then pass X to SelectionVar/SelectionVarStat_fast/LogReg, etc.

Run the code above in your browser using DataLab