strataG (version 2.4.905)

labelHaplotypes: Find and label haplotypes

Description

Identify and group sequences that share the same haplotype.

Usage

labelHaplotypes(x, prefix = NULL, use.indels = TRUE)

# S3 method for default labelHaplotypes(x, prefix = NULL, use.indels = TRUE)

# S3 method for list labelHaplotypes(x, ...)

# S3 method for character labelHaplotypes(x, ...)

# S3 method for gtypes labelHaplotypes(x, ...)

Arguments

x

sequences in a character matrix, list, or DNAbin object, or a haploid '>gtypes object with sequences.

prefix

a character string giving prefix to be applied to numbered haplotypes. If NULL, haplotypes will be labeled with the first label from original sequences.

use.indels

logical. Use indels when comparing sequences?

...

arguments to be passed to labelHaplotypes.default.

Value

For character, list, or DNAbin, a list with the following elements:

haps

named vector (DNAbin) or list of named vectors (multidina) of haplotypes for each sequence in x.

hap.seqs

DNAbin or multidna object containing sequences for each haplotype.

unassigned

data.frame listing closest matching haplotypes for unassignable sequences with N's and the minimum number of substitutions between the two. Will be NULL if no sequences remain unassigned.

For gtypes, a new gtypes object with unassigned individuals stored in the @other slot in an element named 'haps.unassigned' (see getOther).

Details

If any sequences contain ambiguous bases (N's) they are first removed. Then haplotypes are assigned based on the remaining sequences. The sequences with N's that were removed are then assigned to the new haplotypes if it can be done unambiguously (they match only one haplotype with 0 differences once the N's have been removed). If this can't be done they are assigned NAs and listed in the unassigned element.

See Also

expandHaplotypes

Examples

Run this code
# NOT RUN {
# create 5 example short haplotypes
haps <- c(
  H1 = "ggctagct",
  H2 = "agttagct",
  H3 = "agctggct",
  H4 = "agctggct",
  H5 = "ggttagct"
)
# draw and label 100 samples
sample.seqs <- sample(names(haps), 100, rep = TRUE)
ids <- paste(sample.seqs, 1:length(sample.seqs), sep = "_")
sample.seqs <- lapply(sample.seqs, function(x) strsplit(haps[x], "")[[1]])
names(sample.seqs) <- ids

# add 1-2 random ambiguities
with.error <- sample(1:length(sample.seqs), 10)
for(i in with.error) {
  num.errors <- sample(1:2, 1)
  sites <- sample(1:length(sample.seqs[[i]]), num.errors)
  sample.seqs[[i]][sites] <- "n"
}

hap.assign <- labelHaplotypes(sample.seqs, prefix = "Hap.")
hap.assign

# }

Run the code above in your browser using DataCamp Workspace