Learn R Programming

MSclassifR (version 0.5.0)

build_XY_from_peaks: Build design matrix X and response Y from peak intensities

Description

Constructs a sample-by-peak design matrix (X) and an outcome vector/factor (Y) from peak-intensity input. Accepts either a numeric matrix/data.frame (rows = samples, columns = peaks) or a list of aligned per-sample peak vectors. Optionally applies per-sample max normalization and can return X as a sparse dgCMatrix for memory efficiency.

Usage

build_XY_from_peaks(
  peaks,
  labels,
  normalize = c("max", "none"),
  sparse = FALSE,
  name_cols = FALSE,
  name_digits = 4
)

Value

A list with:

  • X: numeric matrix or Matrix::dgCMatrix of dimension n_samples x n_peaks

  • Y: response vector/factor aligned to rows of X (returned as supplied/coerced by the function)

Arguments

peaks

Peak data from which to build the design matrix X. Either:

  • a numeric matrix/data.frame of intensities with rows = samples and columns = peaks, or

  • a list of per-sample numeric vectors (or small tables) aligned to a common set of peaks (e.g., same names or column order). Values are assumed non-negative; NAs are allowed and are ignored when computing per-sample maxima.

labels

Outcome/response labels used to build Y. A vector or factor with one entry per sample (length must equal nrow(peaks) for matrix/data.frame input, or length(peaks) for list input).

normalize

Normalization to apply to each sample’s peak intensities before constructing X. One of "max" or "none" (matched via match.arg). "max" scales each sample by its maximum non-NA intensity; "none" applies no scaling.

sparse

Logical; if TRUE, return X as a sparse Matrix::dgCMatrix. If FALSE, return a base R dense matrix. Default is FALSE.

name_cols

Logical; if TRUE, set column names of X from the m/z values returned by MALDIquant::intensityMatrix (formatted as "mz_<mz>"). Default is FALSE. This only applies when peaks is a list of MALDIquant::MassPeaks (or when attr(X, "mass") is available); otherwise it is ignored. Enabling this can add noticeable overhead for very wide matrices.

name_digits

Integer scalar; number of decimal digits to use when formatting m/z values into column names if name_cols = TRUE. Default is 4. Must be a non-negative integer. Ignored when name_cols = FALSE or when no m/z vector is available (i.e., for plain matrix/data.frame or generic list inputs).

Examples

Run this code
data("CitrobacterRKIspectra", "CitrobacterRKImetadata", package = "MSclassifR")

spectra <- SignalProcessing(CitrobacterRKIspectra)
peaks <- MSclassifR::PeakDetection(x = spectra, averageMassSpec = FALSE)

labels <- CitrobacterRKImetadata$Species   # adjust to your label column

xy <- build_XY_from_peaks(peaks, labels, normalize = "max", sparse = TRUE)

Run the code above in your browser using DataLab