cta20: Two dimensional data in original and log scale
Description
Two dimensional data in original and log scale and their hierarchical modal
clustering. This dataset demonstrates the fact that modal clustering
techniques can be used to cluster untransformed data as it does not
depend on parametric assumptions. The clustering results,
before and after the log transformation both produce nice separation of
the three clusters.
cta20 and logcta20 are two dimensional
matrices. cta20.hmac and logcta20.hmac are objects of class hmac
obtained from applying phmac on cta20 and logcta20 respectively
Details
The dataset is generated by illumina technology for high
throughput genotyping named GOLDEN GATE (
http://www.illumina.com/technology/goldengate_genotyping_assay.ilmn).
The data values are actual measurements made by the machine (intensity), after these are normalized (background subtracted etc).
The data set is used for making genotype calls by Illumina. The data
around X- and Y-axes represents the two homozygous
genotypes (e.g. AA and TT), while the cluster along the 45-degree line represents the
heterozygous (e.g. AT) genotype. Due to noisy reads, the data points often
lie in-between the axes, and cluster detection is used for making
automatic genotype calls.