Learn R Programming

GenoPop (version 1.0.0)

TwoDimSFS: TwoDimSFS

Description

This function calculates a two-dimensional site frequency spectrum from a VCF file for two populations. It processes the file in batches for efficient memory usage. The user can decide between a folded or unfolded spectrum.

Usage

TwoDimSFS(
  vcf_path,
  pop1_individuals,
  pop2_individuals,
  folded = FALSE,
  batch_size = 10000,
  threads = 1,
  write_log = FALSE,
  logfile = "log.txt",
  exclude_ind = NULL
)

Value

Two-dimensional site frequency spectrum as a matrix.

Arguments

vcf_path

Path to the VCF file.

pop1_individuals

Vector of individual names belonging to the first population.

pop2_individuals

Vector of individual names belonging to the second population.

folded

Logical, deciding if folded (TRUE) or unfolded (FALSE) SFS is returned.

batch_size

The number of variants to be processed in each batch (default of 10,000 should be suitable for most use cases).

threads

Number of threads to use for parallel processing.

write_log

Logical, indicating whether to write progress logs.

logfile

Path to the log file where progress will be logged.

exclude_ind

Optional vector of individual IDs to exclude from the analysis. If provided, the function will remove these individuals from the genotype matrix before applying the custom function. Default is NULL, meaning no individuals are excluded.

Examples

Run this code
vcf_file <- system.file("tests/testthat/sim.vcf.gz", package = "GenoPop")
index_file <- system.file("tests/testthat/sim.vcf.gz.tbi", package = "GenoPop")
pop1_individuals <- c("tsk_0", "tsk_1", "tsk_2")
pop2_individuals <- c("tsk_3", "tsk_4", "tsk_5")
sfs_2d <- TwoDimSFS(vcf_file, pop1_individuals, pop2_individuals, folded = TRUE)

Run the code above in your browser using DataLab