screen_for_duplicate_individuals.gp
identifies and merges duplicate individuals based on probabilistic genotypes.
See screen_for_duplicate_individuals
for the original function.
screen_for_duplicate_individuals.gp(
probgeno_df,
ploidy,
parent1 = "P1",
parent2 = "P2",
F1,
cutoff = 0.95,
plot_cor = TRUE,
saveCOV = "PeasonCC",
log = NULL
)
A data frame as read from the scores file produced by function
saveMarkerModels
of R package fitPoly
, or alternatively, a data frame containing the following columns:
SampleName Name of the sample (individual)
MarkerName Name of the marker
P0 Probabilities of dosage score '0'
P1... Probabilities of dosage score '1' etc. (up to max offspring dosage, e.g. P4 for tetraploid population)
maxP Maximum genotype probability identified for a particular individual and marker combination
maxgeno Most probable dosage for a particular individual and marker combination
geno
Most probable dosage for a particular individual and marker combination, if maxP
exceeds a user-defined threshold (e.g. 0.9), otherwise NA
The ploidy of parent 1
character vector with the sample names of parent 1
character vector with the sample names of parent 2
character vector with the sample names of the F1 individuals
Correlation coefficient cut off to declare duplicates. At this correlation coefficient, individuals are merged. If NULL
user input will be asked after plotting.
Logical. Should correlation coefficients be plotted? Can be memory/CPU intensive with high number of individuals.
A file name where the Pearson's correlation coefficient's variation, number, and mean can be saved
Character string specifying the log filename to which standard output should be written. If NULL
log is send to stdout.
A data frame similar to input probgeno_df
, but with duplicate individuals merged.