cluster.snp: Hierarchical Clustering of SNP Data

Description

Clusters SNPs hierachically.

Usage

cluster.snp(x = NULL, d = NULL, method = "average", SNP_index = NULL)

Arguments

The SNP data matrix of size nobs x nvar. Default value is NULL

NULL or a dissimilarity matrix. See the 'Details' section.

method

The agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). See hclust for details.

SNP_index

NULL or the index vector of SNPs to be clustered. See the 'Details' section.

Value

An object of class dendrogram which describes the tree produced by the clustering algorithm hclust.

Details

The SNPs are clustered using hclust, which performs a hierarchical cluster analysis using a set of dissimilarities for the nvar objects being clustered. There are 3 possible scenarios.

If d = NULL, x is used to compute the dissimilarity matrix. The dissimilarity measure between two SNPs is 1 - LD (Linkage Disequilibrium), where LD is defined as the square of the Pearson correlation coefficient. If SNP_index = NULL, all nvar SNPs will be clustered; otherwise only the SNPs with indices specified by SNP_index will be considered.

If the user wishes to use a different dissimilarity measure, d needs to be provided. d must be either a square matrix of size nvar x nvar, or an object of class dist. If d is provided, x and SNP_index will be ignored.

Examples

Run this code

library(MASS)
x <- mvrnorm(60,mu = rep(0,60), Sigma = diag(60))
clust.1 <- cluster.snp(x = x, method = "average")
SNP_index <- seq(1,10)
clust.2 <- cluster.snp(x = x, method = "average", SNP_index = SNP_index)
d <- dist(x)
clust.3 <- cluster.snp(d = d, method = "single")

Run the code above in your browser using DataLab