Learn R Programming

BIGr (version 0.6.2)

thinSNP: Thin a dataframe of SNPs based on genomic position

Description

This function groups SNPs by chromosome, sorts them by physical position, and then iteratively selects SNPs such that no two selected SNPs within the same chromosome are closer than a specified minimum distance.

Usage

thinSNP(df, chrom_col_name, pos_col_name, min_distance)

Value

A thinned dataframe with the same columns as the input.

Arguments

df

The input dataframe.

chrom_col_name

A string specifying the name of the chromosome column.

pos_col_name

A string specifying the name of the physical position column.

min_distance

A numeric value for the minimum distance between selected SNPs. The unit of this distance should match the unit of the pos_col_name column (e.g., base pairs).

Examples

Run this code
# Create sample SNP data
set.seed(123)
n_snps <- 20
snp_data <- data.frame(
  MarkerID = paste0("SNP", 1:n_snps),
  Chrom = sample(c("chr1", "chr2"), n_snps, replace = TRUE),
  ChromPosPhysical = c(
    sort(sample(1:1000, 5)), # SNPs on chr1
    sort(sample(1:1000, 5)) + 500, # More SNPs on chr1
    sort(sample(1:2000, 10))      # SNPs on chr2
  ),
  Allele = sample(c("A/T", "G/C"), n_snps, replace = TRUE)
)
# Ensure it's sorted by Chrom and ChromPosPhysical for clarity in example
snp_data <- snp_data[order(snp_data$Chrom, snp_data$ChromPosPhysical), ]
rownames(snp_data) <- NULL

print("Original SNP data:")
print(snp_data)

# Thin the SNPs, keeping a minimum distance of 100 units (e.g., bp)
thinned_snps <- thinSNP(
  df = snp_data,
  chrom_col_name = "Chrom",
  pos_col_name = "ChromPosPhysical",
  min_distance = 100
)

print("Thinned SNP data (min_distance = 100):")
print(thinned_snps)

# Thin with a larger distance
thinned_snps_large_dist <- thinSNP(
  df = snp_data,
  chrom_col_name = "Chrom",
  pos_col_name = "ChromPosPhysical",
  min_distance = 500
)
print("Thinned SNP data (min_distance = 500):")
print(thinned_snps_large_dist)

Run the code above in your browser using DataLab