Function to automatically run amUnique
at a sequence of parameter values to determine an optimal setting, and optionally plot the result
amUniqueProfile(amDatasetFocal, multilocusMap = NULL, alleleMismatch =
NULL, matchThreshold = NULL, cutHeight = NULL,
guessOptimum = TRUE, doPlot = TRUE, consensusMethod =
1, verbose = TRUE)
A data.frame containing summary data from multiple runs of amUnique
An amDataset object containing genotypes in which an unknown number of individuals are sampled multiple times
Optionally a vector of integers or strings giving the mappings onto loci for all genotype columns in amDatasetFocal. When omitted, columns are assumed to be paired (i.e. diploid loci with alleles in adjacent columns). See details.
A vector giving a sequence, where elements give the maximum number of mismatching alleles which will be tolerated when identifying individuals.
alleleMismatch is also known as the m-hat parameter. If given, matchThreshold
and cutHeight
should be omitted. All three parameters are related. See amUnique
for details.
A vector giving a sequence, where elements give the minimum dissimilarity score which constitutes a match when identifying individuals. matchThreshold is also known as the s-hat parameter. If given, alleleMismatch
and
cutHeight
should be omitted. All three parameters are related. See amUnique
for details.
A vector giving a sequence, where elements give the cutHeight parameter used in dynamic tree cutting by amCluster
. cutHeight is also known as the d-hat parameter. If given, alleleMismatch
and
matchThreshold
should be omitted. All three parameters are related. See details.
If TRUE
a plot showing summary data from multiple runs of amUnique
is produced
If TRUE
will guess the optimal value of the parameter being profiled by examining the profile for the first minimum associated with a drop in multiple matches
as sensitivity to differences among samples decreases.
The method (an integer) used to determine the consensus multilocus genotype from a cluster of multilocus genotypes. See amCluster
for details.
If TRUE
report the progress of the profiling to the console.
Paul Galpern (pgalpern@gmail.com)
Selecting the appropriate value for alleleMismatch
, cutHeight
, or matchThreshold
is an important task. Use
this function to assist in this process. Typically the optimal value of any of these parameters is found where the number of
multiple matches is minimized (the majority of samples are similar to only one unique genotype). Usually there is a minimum when these parameters
are set to be very sensitive to differences among
samples (i.e. alleleMismatch
or cutHeight
are 0, matchThreshold
is 1). Simulations suggest that the next most sensitive minimum in multiple matches is the optimal value. This minimum will often be associated with a drop in multiple matches as sensitivity drops.
Please see the supplementary documentation for more discussion of this important step.
Using guessOptimum = TRUE
will attempt to guess the location of this minimum and add it to the profile plot. Manual assessment of
this guess using the plot is strongly recommended.
If none of alleleMismatch
, cutHeight
, or matchThreshold
is given, the function runs a sequence of values for
alleleMismatch
as follows: seq(from = 0, to = floor( ncol(amDatasetFocal$multilocus) * 0.4), by=1)
multilocusMap
is often not required, as amDataset objects will typically consist of paired columns of genotypes, where each
pair is a separate locus. In cases where this is not the case (e.g. gender is given in only one column),
a map vector must be specified.
Example: amDataset consists of gender followed by 4 diploid loci in paired columns
multilocusMap=c(1,2,2,3,3,4,4,5,5)
or equally
multilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3",
"LOC4", "LOC4")
For more information on selecting consensusMethod
please see amCluster
. The default consensusMethod=1
is typically adequate.
Please see the supplementary documentation for more information. This is available as a vignette. Click on the index link at the bottom of this page to find it.
amUnique
if (FALSE) {
data("amExample2")
## Produce amDataset object
myDataset <- amDataset(amExample2, missingCode="-99", indexColumn=1,
metaDataColumn=2)
## Usage (uncomment)
myUniqueProfile <- amUniqueProfile(myDataset)
## Data set with gender information
data("amExample5")
## Produce amDataset object
myDataset2 <- amDataset(amExample5, missingCode="-99", indexColumn=1,
metaDataColumn=2)
## Usage
myUniqueProfile <- amUniqueProfile(myDataset2,
multilocusMap=c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
9, 9, 10, 10, 11, 11))
}
Run the code above in your browser using DataLab