findPolyploidClusters: K-means clustering

Description

Wrapper for kmeans, allows samples of low presicion to be left out from the clustering and subsequently assigned to clusters

Usage

findPolyploidClusters(X, indSE = rep(TRUE, nrow(X)), centers,
    plot = FALSE, wss.update = TRUE, ...)

Arguments

Matrix with data for a single marker to be clustered, with three columns holding theta, intensity, and SE vectors (in that order) as from the assayData slot of an "

indSE

Logical vector of indexes to samples on which to base the clustering

centers

Numeric vector with theta starting values for the clustering

plot

If TRUE, histogram with bins encompassing the initial centre points is plotted

wss.update

The within-cluster sums of squares are returned from kmeans but not actually used in the genotype calling. If FALSE, time is saved by not recalculating the sums of squares after incl

...

Additional arguments to hist

Value

Object of class "kmeans"

Details

Usually called from within the function callGenotypes or relatives. There the column of intensities is scaled with twice its median value times a scaling factor rPenalty (see setGenoOptions) to ensure (by default) relatively higher weight to the theta dimension during clustering. All samples left out from the clustering are subsequently incorporated into the clusters. By leaving out samples of low precision, the resulting clusters may be more accurate.

Examples

Run this code

#Read pre-processed data directly into AlleleSetIllumina object
rPath <- system.file("extdata", package="beadarrayMSV")
normOpts <- setNormOptions()
dataFiles <- makeFilenames('testdata',normOpts,rPath)
beadFile <- paste(rPath,'beadData_testdata.txt',sep='/')
beadInfo <- read.table(beadFile,sep='\t',header=TRUE,as.is=TRUE)
BSRed <- createAlleleSetFromFiles(dataFiles[1:4],markers=1:10,beadInfo=beadInfo)

#Generate list of marker categories
gO <- setGenoOptions()
polyCent <- generatePolyCenters(ploidy=gO$ploidy)
print(polyCent)

#Estimate list of likely center points for an MSV-5 marker
ind <- 2
dev.new(); par(mfrow=c(3,1),mai=c(.5,.5,.5,.1))
polyCl <- findClusters(assayData(BSRed)$theta[ind,],
    breaks=seq(-0.25,1.25,0.04),plot=TRUE)
print(polyCl)

#Clustering using all samples
sclR <- median(assayData(BSRed)$intensity[ind,],na.rm=TRUE)*ind*gO$rPenalty
X <- matrix(cbind(assayData(BSRed)$theta[ind,],
                  assayData(BSRed)$intensity[ind,]/sclR,
                  assayData(BSRed)$SE[ind,]),ncol=3)
clObj <- findPolyploidClusters(X,centers=polyCl$clPeaks,plot=TRUE)
plot(X[,1],X[,2],col=clObj$cluster)
print(clObj)

Run the code above in your browser using DataLab