Learn R Programming

⚠️There's a newer version (1.4.3) of this package.Take me there.

R-based access to Mass-Spec data (RaMS)

Table of contents: Overview - Installation - Usage - File types - Contact

Overview

RaMS is a lightweight package that provides rapid and tidy access to mass-spectrometry data. This package is lightweight because it’s built from the ground up rather than relying on an extensive network of external libraries. No Rcpp, no Bioconductor, no long load times and strange startup warnings. Just XML parsing provided by xml2 and data handling provided by data.table. Access is rapid because an absolute minimum of data processing occurs. Unlike other packages, RaMS makes no assumptions about what you’d like to do with the data and is simply providing access to the encoded information in an intuitive and R-friendly way. Finally, the access is tidy in the philosophy of tidy data. Tidy data neatly resolves the ragged arrays that mass spectrometers produce and plays nicely with other tidy data packages.

Installation

To install the stable version on CRAN:

install.packages('RaMS')

To install the current development version:

devtools::install_github("wkumler/RaMS", build_vignettes = TRUE)

Finally, load RaMS like every other package:

library(RaMS)

Usage

There’s only one main function in RaMS: the aptly named grabMSdata. This function accepts the names of mass-spectrometry files as well as the data you’d like to extract (e.g. MS1, MS2, BPC, etc.) and produces a list of data tables. Each table is intuitively named within the list and formatted tidily:

msdata_dir <- system.file("extdata", package = "RaMS")
msdata_files <- list.files(msdata_dir, pattern = "mzML", full.names=TRUE)

msdata <- grabMSdata(files = msdata_files[2:4], grab_what = c("BPC", "MS1"))

Some additional examples can be found below, but a more thorough introduction can be found in the vignette.

BPC/TIC data:

Base peak chromatograms (BPCs) and total ion chromatograms (TICs) have three columns, making them super-simple to plot with either base R or the popular ggplot2 library:

knitr::kable(head(msdata$BPC, 3))
rtintfilename
4.00900011141859LB12HL_AB.mzML.gz
4.0245339982309LB12HL_AB.mzML.gz
4.04013310653922LB12HL_AB.mzML.gz
plot(msdata$BPC$rt, msdata$BPC$int, type = "l", ylab="Intensity")

library(ggplot2)
ggplot(msdata$BPC) + geom_line(aes(x = rt, y=int, color=filename)) +
  facet_wrap(~filename, scales = "free_y", ncol = 1) +
  labs(x="Retention time (min)", y="Intensity", color="File name: ") +
  theme(legend.position="top")

MS1 data:

MS1 data includes an additional dimension, the m/z of each ion measured, and has multiple entries per retention time:

knitr::kable(head(msdata$MS1, 3))
rtmzintfilename
4.009139.05031800550.12LB12HL_AB.mzML.gz
4.009148.0967206310.81LB12HL_AB.mzML.gz
4.009136.061871907.15LB12HL_AB.mzML.gz

This tidy format means that it plays nicely with other tidy data packages. Here, we use data.table and a few other tidyverse packages to compare a molecule’s 13C and 15N peak areas to that of the base peak, giving us some clue as to its molecular formula. Note also the use of the trapz function (available in v1.3.2+) to calculate the area of the peak given the retention time and intensity values.

library(data.table)
library(tidyverse)

M <- 118.0865
M_13C <- M + 1.003355
M_15N <- M + 0.997035

iso_data <- imap_dfr(lst(M, M_13C, M_15N), function(mass, isotope){
  peak_data <- msdata$MS1[mz%between%pmppm(mass) & rt%between%c(7.6, 8.2)]
  cbind(peak_data, isotope)
})

iso_data %>%
  group_by(filename, isotope) %>%
  summarise(area=trapz(rt, int)) %>%
  pivot_wider(names_from = isotope, values_from = area) %>%
  mutate(ratio_13C_12C = M_13C/M) %>%
  mutate(ratio_15N_14N = M_15N/M) %>%
  select(filename, contains("ratio")) %>%
  pivot_longer(cols = contains("ratio"), names_to = "isotope") %>%
  group_by(isotope) %>%
  summarize(avg_ratio = mean(value), sd_ratio = sd(value), .groups="drop") %>%
  mutate(isotope=str_extract(isotope, "(?<=_).*(?=_)")) %>%
  knitr::kable()
isotopeavg_ratiosd_ratio
13C0.05440720.0005925
15N0.00336110.0001578

With natural abundances for 13C and 15N of 1.11% and 0.36%, respectively, we can conclude that this molecule likely has five carbons and a single nitrogen.

Of course, it’s always a good idea to plot the peaks and perform a manual check of data quality:

ggplot(iso_data) +
  geom_line(aes(x=rt, y=int, color=filename)) +
  facet_wrap(~isotope, scales = "free_y", ncol = 1)

MS1 data typically consists of many individual chromatograms, so RaMS provides a small function that can bin it into chromatograms based on m/z windows.

msdata$MS1 %>%
  arrange(desc(int)) %>%
  mutate(mz_group=mz_group(mz, ppm=10, max_groups = 3)) %>%
  qplotMS1data(facet_col = "mz_group")

We also use the qplotMS1data function above, which wraps the typical ggplot call to avoid needing to type out ggplot() + geom_line(aes(x=rt, y=int, group=filename)) every time. Both the mz_group and qplotMS1data functions were added in RaMS version 1.3.2.

MS2 data:

DDA (fragmentation) data can also be extracted, allowing rapid and intuitive searches for fragments or neutral losses:

msdata <- grabMSdata(files = msdata_files[1], grab_what = "MS2")

For example, we may be interested in the major fragments of a specific molecule:

msdata$MS2[premz%between%pmppm(118.0865) & int>mean(int)] %>%
  plot(int~fragmz, type="h", data=., ylab="Intensity", xlab="Fragment m/z")

Or want to search for precursors with a specific neutral loss:

msdata$MS2[, neutral_loss:=premz-fragmz] %>%
  filter(neutral_loss%between%pmppm(60.02064, 5)) %>%
  head(3) %>% knitr::kable()
rtpremzfragmzintvoltagefilenameneutral_loss
4.182333118.086458.06590390179.50035DDApos_2.mzML.gz60.02055
4.276100116.070956.050361093.98835DDApos_2.mzML.gz60.02050
4.521367118.086458.06589343084.00035DDApos_2.mzML.gz60.02056

Minifying MS files

As of version 1.1.0, RaMS also has functions that allow irrelevant data to be removed from the file to reduce file sizes. See the vignette for more details.

tmzML documents

Version 1.2.0 of RaMS introduced a new file type, the “transposed mzML” or “tmzML” file to resolve the large memory requirement when working with many files. See the vignette for more details.

File types

RaMS is currently limited to the modern mzML data format and the slightly older mzXML format, as well as the custom tmzML format as of version 1.2.0. Tools to convert data from other formats are available through Proteowizard’s msconvert tool. Data can, however, be gzip compressed (file ending .gz) and this compression actually speeds up data retrieval significantly as well as reducing file sizes.

Currently, RaMS handles MS1 MS2, and MS3 data. This should be easy enough to expand in the future, but right now I haven’t observed a demonstrated need for higher fragmentation level data collection.

For an analysis of how RaMS compares to other methods of MS data access and alternative file types, consider browsing the speed & size comparison vignette.

Contact

Feel free to submit questions, bugs, or feature requests on the GitHub Issues page.


README last built on 2023-11-29

Copy Link

Version

Install

install.packages('RaMS')

Monthly Downloads

627

Version

1.4.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

William Kumler

Last Published

March 2nd, 2024

Functions in RaMS (1.4.0)

grabMzmlMS3

Extract the MS3 data from an mzML nodeset
grabMzmlMetadata

Helper function to extract mzML file metadata
grabMzmlEncodingData

Helper function to extract mzML file encoding data
grabMzmlData

Get mass-spectrometry data from an mzML file
grabMzxmlMS1

Extract the MS1 data from an mzXML nodeset
grabMzmlMS2

Extract the MS2 data from an mzML nodeset
grabMzxmlBPC

Grab the BPC or TIC from a file
grabMzxmlEncodingData

Helper function to extract mzXML file metadata
grabMzxmlData

Get mass-spectrometry data from an mzXML file
grabMzxmlSpectraPremz

Extract the precursor mass from the spectra of an mzXML nodeset
grabMzxmlSpectraVoltage

Extract the collison energies from the spectra of an mzXML nodeset
grabSpectraInt

Extract the intensity information from the spectra of an mzML nodeset
grabSpectraPremz

Extract the precursor mass from the spectra of an mzML nodeset
grabSpectraMz

Extract the mass-to-charge data from the spectra of an mzML nodeset
grabMzxmlMS2

Extract the MS2 data from an mzXML nodeset
node2dt

Convert node to data.table
mz_group

Group m/z values into bins of a specified ppm width
grabMzxmlMS3

Extract the MS3 data from an mzXML nodeset
minifyMzxml

Shrink mzxML files by including only data points near masses of interest
grabMzxmlSpectraRt

Extract the retention time from the spectra of an mzXML nodeset
msdata_connection

S3 constructor for msdata_connection
grabSpectraVoltage

Extract the collison energies from the spectra of an mzML nodeset
grabMzxmlSpectraMzInt

Extract the mass-to-charge data from the spectra of an mzXML nodeset
trapz

Trapezoidal integration of mass-spec retention time / intensity values
grabSpectraRt

Extract the retention time from the spectra of an mzML nodeset
grabMzxmlMetadata

Helper function to extract mzXML file metadata
reexports

Objects exported from other packages
qplotMS1data

Quick plot for MS data
minifyMSdata

Shrink MS data by including only data points near masses of interest
minifyMzml

Shrink mzML files by including only data points near masses of interest
[.msdata_connection

S3 indexing for msdata_connection objects
tmzmlMaker

Maker of tmzML documents
pmppm

Plus/minus parts per million
print.msdata_connection

S3 print option for msdata_connection objects
grabAccessionData

Get arbitrary metadata from an mzML file by accession number
grabMSdata

Grab mass-spectrometry data from file(s)
getEncoded

Convert from compressed binary to R numeric vector
editMSfileRTs

Edit mzML/mzXML file retention times
grabMzmlBPC

Grab the BPC or TIC from a file
RaMS-package

RaMS: R Access to Mass-Spec Data
grabMzmlDAD

Extract the DAD data from an mzML nodeset
$.msdata_connection

S3 dollar sign notation for msdata_connection objects
giveEncoding

Convert from R numeric vector to compressed binary
checkOutputQuality

Check that the output data is properly formatted.
grabMzmlMS1

Extract the MS1 data from an mzML nodeset