Learn R Programming

topolow (version 1.0.0)

prune_distance_network: Prune Distance Data for Network Quality

Description

Iteratively removes viruses and antibodies with insufficient connections to create a well-connected network subset. The pruning continues until all remaining points have at least the specified minimum number of connections.

Usage

prune_distance_network(
  data,
  virus_col,
  antibody_col,
  min_connections,
  iterations = 100
)

Value

A list containing two elements:

pruned_data

A data.frame containing only the measurements for the well-connected subset of points.

stats

A list of pruning statistics including:

  • original_points: Number of unique antigens and sera before pruning.

  • remaining_points: Number of unique antigens and sera after pruning.

  • iterations: Number of pruning iterations performed.

  • min_connections: The minimum connection threshold used.

  • is_connected: A logical indicating if the final network is fully connected.

Arguments

data

Data frame in long format containing: - Column for viruses/antigens - Column for antibodies/antisera - Distance measurements (can contain NAs) - Optional metadata columns

virus_col

Character name of virus/antigen column

antibody_col

Character name of antibody/antiserum column

min_connections

Integer minimum required connections per point

iterations

Integer maximum pruning iterations (default 100)

Examples

Run this code
# Create a sparse dataset with 12 viruses and 12 antibodies
viruses <- paste0("V", 1:12)
antibodies <- paste0("A", 1:12)
all_pairs <- expand.grid(Virus = viruses, Antibody = antibodies, stringsAsFactors = FALSE)

# Sample 70 pairs to create a sparse matrix
set.seed(42)
assay_data <- all_pairs[sample(nrow(all_pairs), 70), ]

# Ensure some viruses/antibodies are poorly connected for the example
assay_data <- assay_data[!(assay_data$Virus %in% c("V11", "V12")),]
assay_data <- assay_data[!(assay_data$Antibody %in% c("A11", "A12")),]

# Add back single connections for the poorly connected nodes
poor_connections <- data.frame(
  Virus = c("V11", "V1", "V12", "V2"),
  Antibody = c("A1", "A11", "A2", "A12")
)
assay_data <- rbind(assay_data, poor_connections)

# View connection counts before pruning
# Virus V11 and V12, and Antibody A11 and A12 have only 1 connection
table(assay_data$Virus)
table(assay_data$Antibody)

# Prune the network to keep only nodes with at least 2 connections
pruned_result <- prune_distance_network(
  data = assay_data,
  virus_col = "Virus",
  antibody_col = "Antibody",
  min_connections = 2
)
                                
# View connection counts after pruning
# The poorly connected nodes have been removed
table(pruned_result$pruned_data$Virus)
table(pruned_result$pruned_data$Antibody)

# Check the summary statistics
print(pruned_result$stats)

Run the code above in your browser using DataLab