Compare an inferred pedigree (Ped2) to a previous or simulated pedigree (Ped1), including comparison of sibship clusters and sibship grandparents.
PedCompare(
Ped1 = NULL,
Ped2 = NULL,
DumPrefix = c("F0", "M0"),
SNPd = NULL,
Symmetrical = TRUE
)
original pedigree, dataframe with columns id-dam-sire; only the first 3 columns will be used.
inferred pedigree, e.g. SeqOUT$Pedigree
or
SeqOUT$PedigreePar
, with columns id-dam-sire.
character vector of length 2 with the dummy prefixes in
Pedigree 2; all IDs not starting with the Dummy prefix are taken as
genotyped if SNPd=NULL
.
character vector with IDs of genotyped individuals.
When determining the category of individuals
(Genotyped/Dummy/X), use the 'highest' category across the two pedigrees
(TRUE
, default) or only consider Ped1
(Symmetrical =
FALSE
).
A list with
A 7 x 5 x 2 named numeric array with the number of matches and mismatches, see below
a large numeric array with number of matches and mismatches, with more detail for all possible combination of categories
A dataframe with side-by-side comparison of the two pedigrees
A consensus pedigree, with Pedigree 2 taking priority over Pedigree 1
Dataframe with all dummy IDs in Pedigree 2 (id.2), and the best-matching individual in Pedigree 1 (id.1)
A subset of MergedPed with mismatches between Ped1 and Ped2, as defined below
as Mismatches, with parents in Ped1 that were not assigned in Ped2
as Mismatches, with parents in Ped2 that were missing in Ped1
'MergedPed', 'Mismatch', 'Ped1only' and 'Ped2only' provide the following columns:
All ids in both Pedigree 1 and 2. For dummy individuals, this is the id in pedigree 2
parents in Pedigree 1
parents in Pedigree 2
The real id of dummy individuals or parents in Pedigree 2, i.e. the best-matching non-genotyped individual in Pedigree 1, or "nomatch". If a sibship in Pedigree 1 is divided over 2 sibships in Pedigree 2, the smaller one will be denoted as "nomatch"
the category of the individual (first letter)
and highest category of the dam (sire) in Pedigree 1 or 2:
G=Genotyped, D=(potential) dummy, X=none. Individual, one-letter categories
are generated by getAssignCat
. Using the 'best' category from
both pedigrees makes comparison between two inferred pedigrees symmetrical
and more intuitive.
classification of dam and sire: Match, Mismatch, P1only, P2only, or '_' when no parent is assigned in either pedigree
The first dimension of Counts denotes the following categories:
Genotyped individual, assigned a genotyped parent in either pedigree
Genotyped individual, assigned a dummy parent, or at least 1 genotyped sibling or a genotyped grandparent in Pedigree 1)
Genotyped individual, total
Dummy individual, assigned a genotyped parent (i.e., grandparent of the sibship in Pedigree 2)
Dummy individual, assigned a dummy parent (i.e., avuncular relationship between sibships in Pedigree 2)
Dummy total
Total total, includes all genotyped individuals, plus non-genotyped individuals in Pedigree 1, plus non-replaced dummy individuals (see below) in Pedigree 2
The second dimension of Counts gives the outcomes:
The total number of individuals with a parent assigned in either or both pedigrees
The same parent is assigned in both pedigrees (non-missing). For dummy parents, it is considered a match if the inferred sibship which contains the most offspring of a non-genotyped parent, consists for more than half of this individual's offspring.
Different parents assigned in the two pedigrees. When a sibship according to Pedigree 1 is split over two sibships in Pedigree 2, the smaller fraction is included in the count here.
Parent in Pedigree 1 but not 2; includes non-assignable parents (e.g. not genotyped and no genotyped offspring).
Parent in Pedigree 2 but not 1.
The third dimension Counts separates between maternal and paternal assignments, where e.g. paternal 'DT' is the assignment of fathers to both maternal and paternal sibships (i.e., to dummies of both sexes).
In 'ConsensusPed', the priority used is parent.r (if not "nomatch") > parent.2 > parent.1. The columns 'id.cat', dam.cat' and 'sire.cat' have two additional levels compared to 'MergedPed':
Genotyped
Dummy individual (in Pedigree 2)
Dummy individual in pedigree 2 replaced by best matching non-genotyped individual in pedigree 1
Ungenotyped, Unconfirmed (parent in Pedigree 1, with no dummy match in Pedigree 2)
No parent in either pedigree
Note that 'assignable' may be overly optimistic. Some parents from
Ped1
indicated as assignable may never be assigned by sequoia, for
example parent-offspring pairs where it cannot be determined which is the
older of the two, or grandparents that are indistinguishable from full
avuncular (i.e. genetics inconclusive because the candidate has no parent
assigned, and ageprior inconclusive).
Considered as potential dummy individuals are all non-genotyped individuals in Pedigree 1 who have, according to either pedigree, at least 2 genotyped offspring, or at least one genotyped offspring and a genotyped parent.
If Pedigree 2 includes samples for which the ID is unknown, the behaviour of
PedCompare
depends on whether the temporary IDs for these samples are
included in SNPd
. If they are included, matching (actual) IDs in
Pedigree 1 will be flagged as mismatches (because the IDs differ). If they
are not included in SNPd
, or SNPd
is not explicitly provided,
matches are accepted, as the situation is indistinguishable from comparing
dummy parents across pedigrees.
This is of course all conditional on relatives of the mystery sample being assigned in Pedigree 2.
The comparison is divided into different classes of `assignable'
parents (getAssignCat
). This includes cases where the focal
individual and parent according to Ped1 are both Genotyped (G-G), as well
as cases where the non-genotyped parent according to Ped1 can be lined up
with a sibship Dummy parent in Ped2 (G-D), or where the non-genotyped focal
individual in Ped1 can be matched to a dummy individual in Ped2 (D-G and
D-D). If SNPd is NULL (the default), and DumPrefix is set to NULL, the
intersect between the IDs in Pedigrees 1 and 2 is taken as the vector of
genotyped individuals.
ComparePairs
for comparison of all pairwise
relationships in 2 pedigrees, EstConf
for repeated
simulate-reconstruct-compare, sequoia
for the main pedigree
reconstruction function, getAssignCat
for all parents in the
reference pedigree that could have been assigned.
# NOT RUN {
data(Ped_HSg5, SimGeno_example, LH_HSg5, package="sequoia")
SeqOUT <- sequoia(GenoM = SimGeno_example, LifeHistData = LH_HSg5, Err=0.001)
compare <- PedCompare(Ped1=Ped_HSg5, Ped2=SeqOUT$Pedigree)
compare$Counts # 2 non-assigned, due to simulated genotyping errors
compare$Counts["TT",,] # totals only
compare$Counts[,,"dam"] # dams only
# inspect 'assignable but non-assigned in Ped2'
compare$P1only[compare$P1only$Cat=="GG", ]
# further inspection:
head(compare$MergedPed)
compare$MergedPed[which(compare$MergedPed$dam.1=="a00001"), ]
# get an overview of all non-genotyped -- dummy matches
BestMatch <- compare$MergedPed[!is.na(compare$MergedPed$id.r),
c("id", "id.r")]
# success of paternity assignment, if genotyped mother correctly assigned
dimnames(compare$Counts.detail)
compare$Counts.detail["G","G",,"Match",]
# }
Run the code above in your browser using DataLab