dominant.to.codominant: Convert Genotypes from Dominant Format to Codominant Format

Description

This function takes an array or matrix of genotypes where each allele is represented in one column and symbols indicate the presence or absence of that allele in a sample. It produces a two-dimensional list of vectors representing genotypes, indexed by sample and locus.

Usage

dominant.to.codominant(domdata, colinfo = NULL,
samples = dimnames(domdata)[[1]], missing = -9, allelepresent = 1, split
= ".")

Arguments

domdata

A two-dimensional array or matrix, in which samples are represented in the first dimension (and named accordingly) and alleles are represented in the second dimension. The symbol specified by allelepresent indicates that a

colinfo

A data frame, indexed by column number from domdata, containing locus names as the first column and allele numbers as the second column.

samples

A character vector containing the names of samples to be converted, if only a subset of samples in domdata are to be used.

missing

The symbol to use to represent missing data in the output.

allelepresent

The symbol used in domdata to indicate that a particular sample has a particular allele.

split

If colinfo=NULL, the character used to separate the locus name and allele number in the column names of domdata.

Value

A two-dimensional list of integer vectors, in the standard polysat genotype format. Samples are represented in the first dimension and loci in the second dimension, and both are named accordingly. Each vector contains all unique alleles for a given sample at a given locus.

Details

Because allele copy number is often unknown, many researchers who work with microsatellites in polyploids record genotype data in a dominant format, such as a matrix of 1's and 0's to represent the presence and absence of peaks as is done with AFLPs. dominant.to.codominant is written to convert that data back to a semi-codominant format so that other analyses or data conversion can be performed. The default symbol to indicate the presence of an allele is 1, but this can be set to any other symbol using the allelepresent argument. It does not matter which symbols are used to indicate that an allele is absent or that there is missing data. If dominant.to.codominant does not find any alleles present for a given sample and locus, it fills in a missing data symbol in that position in the two-dimensional genotype list. This function does not read or write files. Since the user would already have dominant data in an array-like format in a spreadsheet or text document, it should be easily read by read.table and converted to a matrix by as.matrix. There are two options for indicating which locus and allele is represented by each column: 1) These can be specified in the second dimension names of the array or matrix. The name of each column should be a concatenation of the locus name followed by the allele number, and these should be separated by a period or other character as specified in split (e.g. locus1.204). Note that with check.names=TRUE, read.table will convert a lot of symbols (like hyphens or spaces) to periods. It is probably a good idea to inspect the column names of domdata before setting split. 2) Create a data frame containing locus and allele information. The rows should be in the same order as the columns of domdata. The first vector in the data frame should contain the locus names, and the second vector in the data frame should contain the numerical alleles. Use this data frame as colinfo.

Examples

Run this code

# Create a matrix of dominant data (usually read from a file instead)
mysamples <- c("ind1","ind2","ind3")
myalleles <- c("loc1.100","loc1.102","loc1.104","loc1.106",
"loc2.141","loc2.144","loc2.147","loc2.150")
mydomdata <- matrix(nrow = length(mysamples), ncol = length(myalleles),
                    dimnames = list(mysamples, myalleles))
mydomdata["ind1",] <- c(1,1,1,0,0,1,1,0)
mydomdata["ind2",] <- c(1,0,0,1,0,0,1,1)
mydomdata["ind3",] <- c(-9,-9,-9,-9,1,1,0,1)

# inspect the matrix
mydomdata

# convert to codominant data
mycodomdata <- dominant.to.codominant(mydomdata)

# view the list created
mycodomdata
# view genotypes by individual
mycodomdata["ind1",]
mycodomdata["ind2",]
mycodomdata["ind3",]

# Alternately, use a matrix without alleles labeled in the colunn names
dimnames(mydomdata)[[2]] <- NULL
mydomdata

# Make a data frame for a locus and allele index
# (Under normal circumstances you would read this from a file)
laindex <- data.frame(Loci = c(rep("loc1",4), rep("loc2",4)),
Alleles = c(100, 102, 104, 106, 141, 144, 147, 150))
laindex

# convert to codominant data
mycodomdata2 <- dominant.to.codominant(mydomdata, colinfo=laindex)
# look at the results
mycodomdata2["ind1",]
# etc.

Run the code above in your browser using DataLab