mppData
objectsPerform different operations of quality control (QC) on the marker data of an
mppData
object.
QC.mppData(
mppData,
mk.miss = 0.1,
gen.miss = 0.25,
n.lim = 15,
MAF.pop.lim = 0.05,
MAF.cr.lim = NULL,
MAF.cr.miss = TRUE,
MAF.cr.lim2 = NULL,
verbose = TRUE,
n.cores = 1
)
a filtered mppData
object containing the the same elements
as create.mppData
after filtering. It contains also the
following new elements:
Character
vector of genotpes identifiers.
Four columns data.frame
: 1) the type of genotype:
"offspring" for the last genration and "founder" for the genotypes above
the offspring in the pedigree; 2) the genotype indicator; 3-4) the parent 1
(2) of each line.
Parent marker matrix without monomorphic or completely missing markers.
Genetic map corresponding to the list of marker of the
geno.par.clu
object.
List of parents.
Number of crosses.
Number of parents.
Vector of markers that have been removed.
Vector of genotypes that have been removed.
An object of class mppData
formed with
create.mppData
.
Numeric
maximum marker missing rate at the whole
population level comprised between 0 and 1. Default = 0.1.
Numeric
maximum genotype missing rate at the whole
population level comprised between 0 and 1. Default = 0.25.
Numeric
value specifying the minimum cross size.
Default = 15.
Numeric
minimum marker minor allele frequency at
the population level. Default = 0.05.
Numeric vector
specifying the critical within cross
MAF. Marker with a problematic segregation rate in at least
one cross is either set as missing within the problematic cross
(MAF.cr.miss = TRUE
), or remove from the marker matrix
(MAF.cr.miss = FALSE
). For default value see details.
Logical
value specifying if maker with a too low
segregation rate within cross (MAF.cr.lim
) should be put as missing
(MAF.cr.miss = TRUE
) or discarded (MAF.cr.miss = FALSE
).
Default = TRUE.
Numeric
. Alternative option for marker MAF
filtering. Only markers segregating with a MAF larger than MAF.cr.lim2
in at least one cross will be kept for the analysis. Default = NULL.
Logical
value indicating if the steps of the QC should
be printed. Default = TRUE.
Numeric
. Specify here the number of cores you like to
use. Default = 1.
Vincent Garin
The different operations of the quality control are the following:
Remove markers with more than two alleles.
Remove markers that are monomorphic or fully missing in the parents.
Remove markers with a missing rate higher than mk.miss
.
Remove genotypes with more missing markers than gen.miss
.
Remove crosses with less than n.lim
genotypes.
Keep only the most polymorphic marker when multiple markers map at the same position.
Check marker minor allele frequency (MAF). Different strategy can be
used to control marker MAF:A) A first possibility is to filter marker based on MAF at the whole population
level using MAF.pop.lim
, and/or on MAF within crosses using
MAF.cr.lim
.The user can give the its own vector of critical values for MAF within cross
using MAF.cr.lim
. By default, the within cross MAF values are defined
by the following function of the cross-size n.ci: MAF(n.ci) = 0.5 if n.ci c
[0, 10] and MAF(n.ci) = (4.5/n.ci) + 0.05 if n.ci > 10. This means that up
to 10 genotypes, the critical within cross MAF is set to 50
decreases when the number of genotype increases until 5If the within cross MAF is below the limit in at least one cross, then marker
scores of the problematic cross are either put as missing
(MAF.cr.miss = TRUE
) or the whole marker is discarded
(MAF.cr.miss = FALSE
). By default, MAF.cr.miss = TRUE
which
allows to include a larger number of markers and to cover a wider genetic
diversity.B) An alternative is to select only markers that segregate in at least
on cross at the MAF.cr.lim2
rate.
create.mppData
data(mppData_init)
mppData <- QC.mppData(mppData = mppData_init, n.lim = 15, MAF.pop.lim = 0.05,
MAF.cr.miss = TRUE, mk.miss = 0.1,
gen.miss = 0.25, verbose = TRUE)
Run the code above in your browser using DataLab