amUniqueProfile: Determine optimal parameter values for the identification of unique genotypes

Description

Function to automatically run amUnique at a sequence of parameter values to determine an optimal setting, and optionally plot the result

Usage

amUniqueProfile(
		amDatasetFocal, 
		multilocusMap = NULL, 
		alleleMismatch = NULL, 
		matchThreshold = NULL, 
		cutHeight = NULL, 
		guessOptimum = TRUE, 
		doPlot = TRUE, 
		consensusMethod = 1, 
		verbose = TRUE
		)

Value

A data.frame containing summary data from multiple runs of amUnique

Arguments

amDatasetFocal: An amDataset object containing genotypes in which an unknown number of individuals are sampled multiple times.
multilocusMap: Optionally a vector of integers or strings giving the mappings onto loci for all genotype columns in amDatasetFocal.
When omitted, columns are assumed to be paired (i.e., diploid loci with alleles in adjacent columns).
See details.
alleleMismatch: A vector giving a sequence, where elements give the maximum number of mismatching alleles which will be tolerated when identifying individuals; also known as the m-hat parameter.
If specified, then matchThreshold and cutHeight should be omitted; all three parameters are related.
See amUnique for details.
matchThreshold: A vector giving a sequence, where elements give the minimum dissimilarity score which constitutes a match when identifying individuals; also known as the s-hat parameter.
If specified, then alleleMismatch and cutHeight should be omitted; all three parameters are related.
See amUnique for details.
cutHeight: A vector giving a sequence, where elements give the cutHeight parameter used in dynamic tree cutting by amCluster; also known as the d-hat parameter.
If specified, then alleleMismatch and matchThreshold should be omitted; all three parameters are related.
See details.
doPlot: If doPlot = TRUE, then a plot showing summary data from multiple runs of amUnique is produced.
guessOptimum: If guessOptimum = TRUE, then will estimate the optimal value of the parameter being profiled by examining the profile for the first minimum associated with a drop in multiple matches as sensitivity to differences among samples decreases.
consensusMethod: The method (an integer) used to determine the consensus multilocus genotype from a cluster of multilocus genotypes.
See amCluster for details.
verbose: If verbose = TRUE, then the progress of the profiling will be reported to the console.

Author

Paul Galpern (pgalpern@gmail.com)

Details

Selecting the appropriate value for alleleMismatch, cutHeight, or matchThreshold is an important task. Use this function to assist in this process. Typically the optimal value of any of these parameters is found where the number of multiple matches is minimized (the majority of samples are similar to only one unique genotype). Usually there is a minimum when these parameters are set to be very sensitive to differences among samples (i.e., alleleMismatch or cutHeight are 0, matchThreshold is 1). Simulations suggest that the next most sensitive minimum in multiple matches is the optimal value. This minimum will often be associated with a drop in multiple matches as sensitivity drops. For more discussion of this important step, see the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.

Using guessOptimum = TRUE will attempt to estimate the location of this minimum and add it to the profile plot. Manual assessment of this estimate using the plot is strongly recommended.

If none of alleleMismatch, cutHeight, or matchThreshold is given, the function runs a sequence of values for alleleMismatch as follows: seq(from = 0, to = floor(ncol(amDatasetFocal$multilocus) * 0.4), by = 1)

multilocusMap is often not required, as amDataset objects will typically consist of paired columns of genotypes, where each pair is a separate locus. In cases where this is not the case (e.g., gender is given in only one column), a map vector must be specified.

Example: amDataset consists of gender followed by 4 diploid loci in paired columns
multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5)
or equally
multilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3", "LOC4", "LOC4")

For more information on selecting consensusMethod see amCluster. The default consensusMethod = 1 is typically adequate.

References

For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.

Examples

Run this code

	if (FALSE) {
	data("amExample2")
	
	## Produce amDataset object
	myDataset <- 
		amDataset(
			amExample2, 
			missingCode = "-99", 
			indexColumn = 1, 
			metaDataColumn = 2
			)

	## Usage (uncomment)
	myUniqueProfile <- 
		amUniqueProfile(myDataset)
	
	## Data set with gender information
	data("amExample5")
	
	## Produce amDataset object
	myDataset2 <- 
		amDataset(
			amExample5, 
			missingCode = "-99", 
			indexColumn = 1, 
			metaDataColumn = 2
			)
	
	## Usage
	myUniqueProfile <- 
		amUniqueProfile(
			myDataset2,
			multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,
			11, 11))
	}

Run the code above in your browser using DataLab