get_centers: Estimate Cluster Centers for Genotype Dosage Classes

Description

This function estimates the cluster centers for each genotype dosage class based on the `theta` values (e.g., allelic ratios or normalized signal intensities). It supports imputing missing clusters and optionally removing outliers.

Usage

get_centers(
  ratio_geno,
  ploidy,
  n.clusters.thr = NULL,
  type = c("intensities", "counts"),
  rm_outlier = TRUE,
  cluster_median = TRUE
)

Value

A named list with the following elements: - `rm`: Integer flag: `0` (retained), `1` (no clusters found), or `2` (too few clusters). - `centers_theta`: A numeric vector of cluster center positions on the theta scale. - `MarkerName`: Marker identifier. - `n.clusters`: Number of clusters (including imputed ones if applicable).

Arguments

ratio_geno: A data.frame containing the following columns: - `MarkerName`: Identifier for each marker. - `SampleName`: Identifier for each sample. - `theta`: Numeric variable representing allelic ratio or signal intensity. - `geno`: Integer dosage (e.g., 0, 1, 2 for diploids).
ploidy: Integer specifying the organism ploidy (e.g., 2 for diploid).
n.clusters.thr: Integer specifying the minimum number of genotype clusters required for a marker to be retained. If fewer clusters are found, missing ones can be imputed depending on the `type`. Defaults to `ploidy + 1` if `NULL`.
type: Character string indicating the data source type: - `"intensities"`: For array-based allele intensities. - `"counts"`: For sequencing read counts. Default is `"intensities"`.
rm_outlier: Logical; if `TRUE`, outlier samples within genotype clusters will be identified and removed prior to center calculation (default: `TRUE`).
cluster_median: Logical; if `TRUE`, cluster centers are calculated using the median of `theta` values. If `FALSE`, the mean is used (default: `TRUE`).