Learn R Programming

hsphase (version 3.0.0)

.prCalus: Calus-style recursive clustering of individuals using an OH matrix

Description

Performs a recursive hierarchical clustering on an opposing-homozygotes (OH) matrix to split individuals into two groups at each step (Ward clustering), until within-group OH values fall below a threshold derived from allele frequencies estimated from the genotype matrix.

Usage

.prCalus(oh, genotype)

Value

A data.frame with columns:

id

Individual ID (character).

group

An integer-like group code (generated randomly; not reproducible).

Arguments

oh

A numeric matrix representing the opposing-homozygotes (OH) counts between individuals. Row and column names should be individual IDs. The matrix is expected to be square and symmetric.

genotype

A numeric genotype matrix of dimension \(n \times m\) (individuals \(\times\) SNPs), coded as 0 (AA), 1 (AB), 2 (BB), and 9 for missing values (as used in hsphase).

Side effects

This function writes to and reads from a file named "temp.txt" in the current working directory, and then deletes it.

Details

The function returns a two-column data frame containing individual IDs and an assigned group code. Group codes are generated randomly (via rnorm()) and therefore are not stable across runs.

The threshold maxsnpnooh is computed from per-SNP minor allele frequencies (.maf) and then reduced by 10%. The recursion proceeds as:

  1. Compute pairwise distances from oh using .fastdist and convert to a dist object.

  2. Apply hierarchical clustering (hclust with method = "ward.D").

  3. Cut the dendrogram into k = 2 groups.

  4. For each group, compute the maximum within-group OH value; if it exceeds maxsnpnooh and group size is > 2, recurse into that subgroup. Otherwise, write group assignments to a temporary file and stop recursion.

See Also