Learn R Programming

hsphase (version 3.0.0)

.prSimple: Simple recursive clustering using an OH matrix with a linear threshold rule

Description

Performs a recursive hierarchical clustering on an opposing-homozygotes (OH) matrix using Ward clustering. Clusters are split until the maximum within- cluster OH value is below a threshold computed from the number of SNPs (snpNooh) using a linear rule.

Usage

.prSimple(oh, snpNooh, intercept = 26.3415, coefficient = 77.3171)

Value

A data.frame with columns:

id

Individual ID (character).

group

An integer-like group code (generated randomly; not reproducible).

Arguments

oh

A numeric matrix representing opposing-homozygotes (OH) counts between individuals. Row and column names should be individual IDs. The matrix is expected to be square and symmetric.

snpNooh

Numeric scalar. Number of SNPs used for OH calculation (or a proxy for SNP density) used to derive the stopping threshold.

intercept

Numeric scalar. Intercept for the linear threshold rule.

coefficient

Numeric scalar. Slope for the linear threshold rule.

Side effects

This function writes to and reads from a file named "temp.txt" in the current working directory, and then deletes it.

Details

The threshold is computed as: $$maxsnpnooh = (intercept + coefficient * snpNooh) - 15 * snpNooh$$

The function returns a two-column data frame with individual IDs and a group code. Group codes are generated randomly (via rnorm()) and therefore are not stable across runs.

The recursion proceeds as follows:

  1. Compute a distance object from oh using .fastdist and convert it to a dist object.

  2. Apply hierarchical clustering using hclust with method = "ward.D".

  3. Cut the dendrogram into two groups using cutree.

  4. For each group, compute the maximum within-group OH value; if it exceeds maxsnpnooh and the group has more than two individuals, recurse into that subgroup. Otherwise, write group assignments and stop.

See Also