Internal helper used by rpoh (reconstruction pedigree of half-sib
families). This function recursively splits individuals into two clusters using
hierarchical clustering on a distance derived from the provided opposing
homozygote (OH) matrix, and then decides whether each cluster should be split
further by checking the maximum number of recombination events inferred within
that cluster.
.rpohhsphase(
genotypeMatrix,
oh,
forwardVectorSize = 30,
excludeFP = TRUE,
nsap = 3,
maxRec = 15
)A data.frame with two columns:
id: individual IDs
group: an integer-like group label assigned by the recursive
procedure
Numeric genotype matrix (individuals in rows, SNPs in columns) coded as `0`, `1`, `2` (and typically `9` for missing), as used by hsphase. This matrix is subset recursively when splitting clusters.
A square opposing-homozygote matrix for the same individuals as
genotypeMatrix (rownames/colnames are individual IDs). Typically
produced by ohg. This matrix is subset recursively along with
genotypeMatrix.
Integer. Passed to bmh when computing
recombination blocks inside each candidate cluster.
Logical. Passed to bmh.
Integer. Passed to bmh.
Integer. Maximum allowed recombination count (within a cluster) before the cluster is recursively split again.
This function uses a fixed temporary filename "temp.txt" in the
current working directory and deletes it at the end. This is not safe under
parallel execution or if the working directory is not writable.
Group labels are generated using rnorm(), so results are not
deterministic unless a seed is set and the recursion order remains identical.
The recursive splitting stops for a cluster when the maximum recombination
count in that cluster is <= maxRec. Final group assignments are written
to a temporary file and then read back as a two-column data frame.
The algorithm:
Converts oh to a distance object via as.dist(.fastdist(oh))
and performs hierarchical clustering (hclust, Ward method).
Splits into k = 2 clusters via cutree.
For each cluster with at least 4 individuals, computes recombination
counts as recombinations(bmh(subGenotype, ...)) and uses the maximum
recombination count as a stop/split criterion.
If max(recombinations) > maxRec, the cluster is split again
recursively; otherwise, individuals in that cluster are assigned a new group
label and written to a temporary file.
rpoh, ohg, bmh,
recombinations, .fastdist