Learn R Programming

topolow (version 1.0.0)

process_antigenic_data: Process Raw Antigenic Assay Data

Description

Processes raw antigenic assay data from CSV files into standardized long and matrix formats. Handles both titer data (which needs conversion to distances) and direct distance measurements like IC50. Preserves threshold indicators (<, >) and handles repeated measurements by averaging.

Usage

process_antigenic_data(
  file_path,
  antigen_col,
  serum_col,
  value_col,
  is_titer = TRUE,
  metadata_cols = NULL,
  id_prefix = FALSE,
  base = NULL,
  scale_factor = 10
)

Value

A list containing two elements:

long

A data.frame in long format with standardized columns, including the original identifiers, processed values, and calculated distances. Any specified metadata is also included.

matrix

A numeric matrix representing the processed symmetric distance matrix, with antigens and sera on columns and rows.

Arguments

file_path

Character. Path to CSV file containing raw data.

antigen_col

Character. Name of column containing virus/antigen identifiers.

serum_col

Character. Name of column containing serum/antibody identifiers.

value_col

Character. Name of column containing measurements (titers or distances).

is_titer

Logical. Whether values are titers (TRUE) or distances like IC50 (FALSE).

metadata_cols

Character vector. Names of additional columns to preserve.

id_prefix

Logical. Whether to prefix IDs with V/ and S/ (default: TRUE).

base

Numeric. Base for logarithm transformation (default: 2 for titers, e for IC50).

scale_factor

Numeric. Scale factor for titers (default: 10).

Details

The function handles these key steps:

  1. Reads and validates input data

  2. Transforms values to log scale

  3. Converts titers to distances if needed

  4. Averages repeated measurements

  5. Creates standardized long format

  6. Creates distance matrix

  7. Preserves metadata and threshold indicators

  8. Preserves virusYear and serumYear columns if present

Input requirements and constraints:

  • CSV file must contain required columns

  • Column names must match specified parameters in the function input

  • Values can include threshold indicators (< or >)

  • Metadata columns must exist if specified

  • Allowed Year-related column names are "virusYear" and "serumYear"

Examples

Run this code
# Locate the example data file included in the package
file_path <- system.file("extdata", "example_titer_data.csv", package = "topolow")

# Check if the file exists before running the example
if (file.exists(file_path)) {
  # Process the example titer data
  results <- process_antigenic_data(
    file_path,
    antigen_col = "virusStrain",
    serum_col = "serumStrain", 
    value_col = "titer",
    is_titer = TRUE,
    metadata_cols = c("cluster", "color")
  )

  # View the long format data
  print(results$long)
  # View the distance matrix
  print(results$matrix)
}

Run the code above in your browser using DataLab