amUnique: Identification of unique genotypes

Description

Functions to identify unique genotypes and review the output of the analysis in HTML, or CSV (a unique analysis). Samples are clustered and matched based on their dissimilarity score (see amMatrix). The match probability, Psib, is also calculated. This is the probability that a sample is a sibling of a unique genotype (and therefore not a replicate sample) given the allele frequencies in a population consisting of only the unique genotypes (Wilberg & Dreher, 2004).

Usage

amUnique(amDatasetFocal, multilocusMap = NULL, alleleMismatch = NULL,
    matchThreshold = NULL, cutHeight = NULL, doPsib = "missing",
    consensusMethod = 1, verbose = FALSE)
amHTML.amUnique(x, htmlFile = NULL, htmlCSS = amCSSForHTML())
amCSV.amUnique(x, csvFile, uniqueOnly = FALSE)
# S3 method for amUnique
summary(object, html = NULL, csv = NULL, ...)

Value

An amUnique object Or side effects: analysis summary written to an HTML file or to the console, or unique genotypes written to a CSV file.

Arguments

amDatasetFocal: An amDataset object containing genotypes in which an unknown number of individuals are sampled multiple times
multilocusMap: Optional. A vector of integers or strings giving the mappings onto loci for all genotype columns in amDatasetFocal. When omitted, columns are assumed to be paired (i.e. diploid loci with alleles in adjacent columns). See details.
alleleMismatch: Optional. Maximum number of mismatching alleles which will be tolerated when identifying individuals. Also known as m-hat parameter. If given, matchThreshold and cutHeight should be omitted. All three parameters are related. See details.
matchThreshold: Optional. The minimum dissimilarity score which constitutes a match when identifying individuals. Also known as s-hat parameter. If given, alleleMismatch and cutHeight should be omitted. All three parameters are related. See details.
cutHeight: Optional. The cutHeight parameter used in dynamic tree cutting by amCluster. Also known as d-hat parameter. If given, alleleMismatch and matchThreshold should be omitted. All three parameters are related. See details.
doPsib: String specifying how match probability should be calculated. See details.
consensusMethod: The method (an integer) used to determine the consensus multilocus genotype from a cluster of multilocus genotypes. See amCluster for details. Typically the default is adequate.
verbose: If TRUE report the progress of the analysis to the console. Useful with datasets consisting of thousands of samples where progress may be slow.
object, x: An amUnique object.
htmlFile: The path to an HTML file to create. If htmlFile=NULL a file is created in the operating system temporary directory and is then opened in the default browser.
htmlCSS: A string containing a valid cascading style sheet. A default style sheet is provided in amCSSForHTML. See amCSSForHTML for details of how to tweak this CSS.
html: If html=TRUE the summary method produces and loads an HTML file in the default browser.
html can also contain a path to a file where HTML output will be written.
Note that summary.amUnique does not produce formatted output for the console.
csvFile, csv: The path to a CSV file to create containing a CSV representation of the amUnique analysis.
uniqueOnly: If uniqueOnly=TRUE only the unique genotypes will be saved to a CSV, with no additional information associated with the analysis.
...: Additional arguments to summary

Author

Paul Galpern (pgalpern@gmail.com)

Details

Only one of alleleMismatch, cutHeight, matchThreshold can be given, as the three parameters are related.
alleleMismatch is the most intuitive way to understand how the identification of unique genotypes proceeds. For example, a setting of alleleMismatch = 4 implies that up to four alleles may be different for multiple samples to be representatives of the same individual. In practice, however, this value is only an approximation of the amount of mismatch that may be tolerated. This is because the clustering process used to identify unique genotypes, and the subsequent matching which identifies samples that match these unique genotypes is based on a dissimilarity metric or score (see amMatrix) that incorporates both allele mismatches and missing data. alleleMismatch is not used in analyses and is converted to this dissimilarity metric in the following manner: cutHeight which is parameter for amCluster and called from this function is cutHeight = alleleMismatch/(number of allele columns) and matchThreshold which is a parameter for amPairwise and also called from this function is matchThreshold = 1 - cutHeight.

Selecting the appropriate value for alleleMismatch, cutHeight, or matchThreshold is an important task. Use amUniqueProfile to assist in this process. Please see supplementary documentation for more information

doPsib = "missing" is the default and specifies that match probability Psib should be calculated for samples that match unique genotypes and have no allele mismatches, but may differ by having missing data. doPsib = "all" specifies that Psib should be calculated for all samples that match unique genotypes. In this case, if allele mismatches occur, alleles are assumed to be missing at the mismatching loci.

multilocusMap is often not required, as amDataset objects will typically consist of paired columns of genotypes, where each pair is a separate locus. In cases where this is not the case (e.g. gender is given in only one column), a map vector must be specified.
Example: amDataset consists of gender followed by 4 diploid loci in paired columns
multilocusMap=c(1,2,2,3,3,4,4,5,5)
or equally
multilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3", "LOC4", "LOC4")

For more information on selecting consensusMethod please see amCluster. The default consensusMethod=1 is typically adequate.

References

Please see the supplementary documentation for more information. This is available as a vignette. Click on the index link at the bottom of this page to find it.

Wilberg MJ, Dreher BP (2004) GENECAP: a program for analysis of multilocus genotype data for non-invasive sampling and capture-recapture population estimation. Molecular Ecology Notes, 4, 783-785.

Examples

Run this code


if (FALSE) {

data("amExample2")

## Produce amDataset object
myDataset <- amDataset(amExample2, missingCode="-99", indexColumn=1,
    ignoreColumn=2)

## Usage
## Optimal alleleMismatch parameter previously found using amUniqueProfile()
myUnique <- amUnique(myDataset, alleleMismatch=3)

## Display analysis as HTML in default browser
summary(myUnique, html=TRUE)

## Save analysis to HTML file
summary(myUnique, html="myUnique.htm")

## Save analysis to a CSV file
summary(myUnique, csv="myUnique.csv")

## Save unique genotypes only to a CSV file
summary(myUnique, csv="myUnique.csv", uniqueOnly=TRUE)

## Data set with gender information
data("amExample5")

## Produce amDataset object
myDataset2 <- amDataset(amExample5, missingCode="-99", indexColumn=1,
    metaDataColumn=2)

## Usage
## Optimal alleleMismatch parameter previously found using amUniqueProfile()
myUniqueProfile <- amUnique(myDataset2,
    multilocusMap=c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
    9, 9, 10, 10, 11, 11), alleleMismatch=3)

}

Run the code above in your browser using DataLab