setGenoOptions(largeSample = FALSE, snpPerArrayLim = 0.8,
arrayPerSnpLim = 0.8, ploidy = "tetra",
filterLim = 0, detectLim = 0.8,
wSpreadLim = suggestGeno(largeSample)$wSpreadLim,
devCentLim = 0.35,
hwAlpha = suggestGeno(largeSample)$hwAlpha,
probsIndSE = suggestGeno(largeSample)$probsIndSE,
afList = seq(0, 0.5, 0.05),
clAlpha = suggestGeno(largeSample)$clAlpha,
rPenalty = 2,
rotationLim = suggestGeno(largeSample)$rotationLim,
minClLim = 5, nSdOverlap = 2,
minBin = suggestGeno(largeSample)$minBin,
binWidth = suggestGeno(largeSample)$binWidth) callGenotypes(BSRed,
gO = setGenoOptions(largeSample = ncol(BSRed) > 250))
callGenotypes.interactive(BSRed,
gO = setGenoOptions(largeSample = ncol(BSRed) > 250))
callGenotypes.verboseTest(BSRed, singleMarker = 1,
gO = setGenoOptions(largeSample = ncol(BSRed) > 250))
generatePolyCenters
)assayData
assayData
assayData
assayData
"AlleleSetIllumina "
object, holding a required
phenoData
column preprocessBeadSet
)setGenoOptions
)setGenoOptions
is a list with all defined or
suggested options needed for genotype calling
Output from callGenotypes
is an
"AlleleSetIllumina "
object with an extra
assayData
entry:"AlleleSetIllumina "
output from
callGenotypes.interactive
in addition holds the
assayData
entries:FALSE
indicating
pedigree violations among offspring. Parental values are all
NA
(see validateCallsPedigree
)validateSingleCall
)featureData
:callGenotypes.verboseTest
is a list containinggO$snpPerArrayLim
, and markers with a fraction of non-NA
arrays less than gO$arrayPerSnpLim
, are discarded from the
analysis. Also, markers with range of assayData
gO$filterLim
are called assayData
entries
gO$rPenalty
times its median value. The value of
gO$ploidy
sets the allowed cluster combinations through the function
generatePolyCenters
. Data points below the average value
of the phenoData
column gO$probsIndSE
are left out from the clustering
step. Two or three of the most likely cluster combinations are
estimated with the function getCenters
. This function
uses histograms to find starting values for the clustering, with input
parameters gO$minBin
defining a peak-height filter and
gO$binWidth
setting an initial bin-width. The ranked
cluster combinations are tested in turn until one is found which
passes the criteria below. First, gO$devCentLim
controls the maximum distance a cluster is allowed
to deviate from its expected or ideal position. Second, gO$wSpreadLim
limits how large the within cluster standard deviation can be. Both
these criteria are tested in the
The next test is for Hardy-Weinberg (HW)
equilibrium. This is a very powerful test to detect if the clustering
has failed, given HW can be assumed. Otherwise, set the significance value
gO$hwAlpha
to zero to allow any deviation from HW. At
duplicated loci, the observed B allele-frequency (BAF) is in
fact the mean BAF across both paralogues. Several candidate values
of BAF for one paralogue are set in gO$afList
, such that
the BAF for the other paralogue is given. Several values of
gO$afList
within [0, 0.5] are tested for HW, and the most
likely BAF at both paralogs are those resulting in the highest p-value.
Another, somewhat less powerful, quality control test is to look for
overlapping clusters. If a cluster centre on the Cent
, and its spread is given by Within.SD
,
clusters are tested for overlap using Cent +/- gO$nSdOverlap * Within.SD
Markers passing the quality control
are subjected to a Hotelling's $T^2$-test (Hotelling, 1931; Gidskehaug et
al., 2007), which effectively superimposes an ellipse on each cluster
and discards all data points falling outside its boundaries or within
overlapping ellipses. Due to inaccurate ellipses for small clusters,
only clusters with more than gO$minClLim
members are
tested. The extents of the clusters are controlled by the significance
level gO$clAlpha
of the $T^2$-tests in such a way that a
small value yield large ellipses. The orientation of an ellipse is
defined by the ratio of variances in the variance-/covariance matrix
from the Hotelling's test. If the ratio of the Sratio
, the
weighted sum of Sratio
across clusters is not allowed to exceed
gO$rotationLim
. Finally, the call rate is required to exceed gO$detectLim
, or
the algorithm will move on to the next most likely cluster combination. If a
marker passes all the tests for one of the candidate cluster
combinations, the genotype is called. Otherwise the marker is failed
(i.e. called
callGenotypes
is the main genotype calling function. Before
genotyping thousands of markers, it is important spend a little time
tuning the options in gO
. A few hundred markers may be called
using the default settings, and plotted using
plotGenotypes
. Individual markers whith questionable
calls can be investigated more closely using
callGenotypes.verboseTest
. This function performs all the tests
described above, for all suggested cluster combinations, without
failing any of the tests. Rather, all the test results are plotted and
returned, revealing which steps can be taken to improve the
clustering, if any. In these plots, green dots are the estimated
centre points, and red and orange dots represent samples outside
cluster boundaries or within overlapping ellipses,
respectively. Several markers representing each of the possible
genotype categories should be investigated in this way before commencing
with genotyping the full set of markers.
For some markers with highly overlapping or inaccurate clusters,
interactive clustering might be the last option. This can be performed
using the function callGenotypes.interactive
(requires package
phenoData
column validateCallsPedigree
). Use the function
assignToAlleleSet
to incorporate the manual results into
a larger "
object.
generatePolyCenters
, getCenters
,
findPolyploidClusters
, testHardyWeinberg
,
ggobi
, plotGenotypes
,
validateCallsPedigree
, assignToAlleleSet
,
manualCall
#Read 25 markers into an AlleleSetIllumina object
rPath <- system.file("extdata", package="beadarrayMSV")
normOpts <- setNormOptions()
dataFiles <- makeFilenames('testdata',normOpts,rPath)
beadFile <- paste(rPath,'beadData_testdata.txt',sep='/')
beadInfo <- read.table(beadFile,sep='\t',header=TRUE,as.is=TRUE)
BSRed <- createAlleleSetFromFiles(dataFiles[1:4],markers=1:25,beadInfo=beadInfo)
#Genotype calling and validation
gO <- setGenoOptions(largeSample=TRUE)
BSRed <- callGenotypes(BSRed, gO=gO)
BSRed <- validateCallsPedigree(BSRed)
sumClass <- tapply(rep(1,nrow(BSRed)),fData(BSRed)$Classification,sum)
print(sumClass)
#Plot default setting calls
plotGenotypes(BSRed)
#Tune settings to call an initially failed marker
dev.new()
verboseRes <- callGenotypes.verboseTest(BSRed, gO=gO, singleMarker=23)
print(verboseRes$fData)
print(verboseRes$test)
gO <- setGenoOptions(largeSample=TRUE, wSpreadLim=.1, hwAlpha=1e-50)
verboseRes <- callGenotypes.verboseTest(BSRed, gO=gO, singleMarker=23)
#New settings give (likely incorrect) SNP-call
BSRed <- callGenotypes(BSRed, gO=gO)
BSRed <- validateCallsPedigree(BSRed)
dev.new()
plotGenotypes(BSRed)
Run the code above in your browser using DataLab