SEL.HAP: SEL.HAP: Haplotype selection and extension along the genome

Description

Perform genome-wide haplotype selection and extension using the CHAP-GWAS framework. The function scans along each chromosome, builds local haplotype segments, and adaptively extends them based on association evidence with the phenotype.

Usage

SEL.HAP(GEN, YFIX, KIN, nHap, p.threshold, PAR)

Value

A list of three matrices summarizing:

FINAL[[1]]: initial haplotype segments
FINAL[[2]]: extended haplotype segments
FINAL[[3]]: final selected segments after extension

Arguments

GEN: Genotype matrix with rows corresponding to markers and columns corresponding to individuals. The first two columns give chromosome (chr) and physical position (pos); the remaining columns contain alleles for each individual (e.g. "A", "C", "G", "T"), one allele per haplotype copy.
YFIX: A matrix or data.frame with phenotype in the first column and fixed-effect covariates (e.g. intercept, PCs) in the remaining columns, one row per individual.
KIN: A list of kinship matrices, each of dimension \(n \times n\), where \(n\) is the number of individuals.
nHap: Initial haplotype window size (number of consecutive markers).
p.threshold: P-value threshold for haplotype extension.
PAR: Optional variance component parameters passed to RANDOM(). If NULL, they are estimated internally.

Examples

Run this code

## Minimal example with small simulated data (alleles encoded as A/C/G/T)
set.seed(1)

## Number of individuals and markers
n_ind  <- 200
n_mark <- 50

## Construct a simple GEN matrix:
## first two columns: chromosome and position
## each individual is represented by two allele columns (A1/A2)
chr <- rep(1, n_mark)
pos <- seq_len(n_mark) * 100
alleles <- c("A", "C", "G", "T")

geno <- matrix(NA_character_, nrow = n_mark, ncol = 2 * n_ind)
for (m in seq_len(n_mark)) {
  a <- sample(alleles, 2, replace = FALSE)   # biallelic per marker
  geno[m, ] <- sample(a, 2 * n_ind, replace = TRUE)
}

colnames(geno) <- as.vector(rbind(
  paste0("id", seq_len(n_ind), "_A1"),
  paste0("id", seq_len(n_ind), "_A2")
))

GEN <- cbind(chr, pos, geno)

## Phenotype + intercept as fixed effect
y <- rnorm(n_ind)
X <- cbind(1, rnorm(n_ind))  # intercept + one covariate
YFIX <- cbind(y, X)

## Simple kinship: identity matrix
KIN <- list(diag(n_ind))

## Run SEL.HAP with a small initial window and mild threshold
res <- SEL.HAP(GEN, YFIX, KIN,
               nHap = 2,
               p.threshold = 0.05,
               PAR = NULL)

## Inspect the structure of the result (three matrices)
str(res)

Run the code above in your browser using DataLab