screen_for_duplicate_individuals.gp
identifies and merges duplicate individuals based on probabilistic genotypes.
See screen_for_duplicate_individuals
for the original function.
screen_for_duplicate_individuals.gp(
probgeno_df,
ploidy,
parent1 = "P1",
parent2 = "P2",
F1,
cutoff = 0.95,
plot_cor = TRUE,
log = NULL
)
A data frame similar to input probgeno_df
, but with duplicate individuals merged.
A data frame as read from the scores file produced by function
saveMarkerModels
of R package fitPoly
, or alternatively, a data frame containing the following columns:
Name of the sample (individual)
Name of the marker
Probabilities of dosage score '0'
Probabilities of dosage score '1' etc. (up to max dosage, e.g. P4 for tetraploid population)
Maximum genotype probability identified for a particular individual and marker combination
Most probable dosage for a particular individual and marker combination
Most probable dosage for a particular individual and marker combination, if maxP
exceeds a user-defined threshold (e.g. 0.9), otherwise NA
The ploidy of parent 1
character vector with the sample names of parent 1
character vector with the sample names of parent 2
character vector with the sample names of the F1 individuals
Correlation coefficient cut off to declare duplicates. At this correlation coefficient, individuals are merged. If NULL
user input will be asked after plotting.
Logical. Should correlation coefficients be plotted? Can be memory/CPU intensive with high number of individuals.
Character string specifying the log filename to which standard output should be written. If NULL
log is send to stdout.