Learn R Programming

Qploidy (version 1.0.1)

get_centers: Estimate Cluster Centers for Genotype Dosage Classes

Description

This function estimates the cluster centers for each genotype dosage class based on the `theta` values (e.g., allelic ratios or normalized signal intensities). It supports imputing missing clusters and optionally removing outliers.

Usage

get_centers(
  ratio_geno,
  ploidy,
  n.clusters.thr = NULL,
  type = c("intensities", "counts"),
  rm_outlier = TRUE,
  cluster_median = TRUE
)

Value

A named list with the following elements: - `rm`: Integer flag: `0` (retained), `1` (no clusters found), or `2` (too few clusters). - `centers_theta`: A numeric vector of cluster center positions on the theta scale. - `MarkerName`: Marker identifier. - `n.clusters`: Number of clusters (including imputed ones if applicable).

Arguments

ratio_geno

A data.frame containing the following columns: - `MarkerName`: Identifier for each marker. - `SampleName`: Identifier for each sample. - `theta`: Numeric variable representing allelic ratio or signal intensity. - `geno`: Integer dosage (e.g., 0, 1, 2 for diploids).

ploidy

Integer specifying the organism ploidy (e.g., 2 for diploid).

n.clusters.thr

Integer specifying the minimum number of genotype clusters required for a marker to be retained. If fewer clusters are found, missing ones can be imputed depending on the `type`. Defaults to `ploidy + 1` if `NULL`.

type

Character string indicating the data source type: - `"intensities"`: For array-based allele intensities. - `"counts"`: For sequencing read counts. Default is `"intensities"`.

rm_outlier

Logical; if `TRUE`, outlier samples within genotype clusters will be identified and removed prior to center calculation (default: `TRUE`).

cluster_median

Logical; if `TRUE`, cluster centers are calculated using the median of `theta` values. If `FALSE`, the mean is used (default: `TRUE`).