Learn R Programming

sequoia (version 2.0.7)

getAssignCat: Assignability of reference pedigree

Description

Identify which individuals are genotyped, and which can potentially be substituted by a dummy individual. 'Dummifiable' are those non-genotyped individuals with at least 2 genotyped offspring, or at least 1 genotyped offspring and 1 genotyped parent.

Usage

getAssignCat(Pedigree, Genotyped)

Arguments

Pedigree

dataframe with columns id-dam-sire. Reference pedigree.

Genotyped

character vector with ids of genotyped individuals.

Value

the Pedigree dataframe with 2 additional columns, dam.cat and sire.cat, with coding similar to that used by PedCompare:

GG

Genotyped individual, genotyped parent

GD

Genotyped individual, Dummy parent; i.e. 'id' has at least 1 genotyped sibling or a genotyped grandparent

DG

Dummy individual, Genotyped parent; i.e. 'id' has at least 1 genotyped offspring, and parent is assignable as grandparent of the dummy-substituted-individual's offspring

DD

Dummy individual, Dummy parent

X

Either or both id and parent is/are not genotyped, and has/have no genotyped offspring, and therefore the parent- offspring link cannot be assigned.

NA

No parent in Pedigree

Details

It is assumed that all individuals in Genotyped have been genotyped for a sufficient number of SNPs. To identify samples with a too-low call rate, use CheckGeno. To calculate the call rate for all samples, see the examples below.

Some parents indicated here as assignable may never be assigned by sequoia, for example parent-offspring pairs where it cannot be determined which is the older of the two, or grandparents that are indistinguishable from full avuncular (i.e. genetics inconclusive because the candidate has no parent assigned, and ageprior inconclusive).

Examples

Run this code
# NOT RUN {
data(Ped_HSg5, SimGeno_example, package="sequoia")
PedA <- getAssignCat(Ped_HSg5, rownames(SimGeno_example))
table(PedA$dam.cat, PedA$sire.cat, useNA="ifany")

# calculate call rate
# }
# NOT RUN {
CallRates <- apply(MyGenotypes, MARGIN=1,
                   FUN = function(x) sum(x!=-9)) / ncol(MyGenotypes)
hist(CallRates, breaks=50, col="grey")
GoodSamples <- rownames(MyGenotypes)[ CallRates > 0.8]
threshold depends on total number of SNPs, genotyping errors, proportion of
candidate parents that are SNPd (sibship clustering is more prone to false
positives).
PedA <- getAssignCat(MyOldPedigree, rownames(GoodSamples))
# }

Run the code above in your browser using DataLab