screen_for_duplicate_individuals.gp: Screen for duplicate individuals using weighted genotype probabilities

Description

screen_for_duplicate_individuals.gp identifies and merges duplicate individuals based on probabilistic genotypes. See screen_for_duplicate_individuals for the original function.

Usage

screen_for_duplicate_individuals.gp(
  probgeno_df,
  ploidy,
  parent1 = "P1",
  parent2 = "P2",
  F1,
  cutoff = 0.95,
  plot_cor = TRUE,
  log = NULL
)

Value

A data frame similar to input probgeno_df, but with duplicate individuals merged.

Arguments

probgeno_df

A data frame as read from the scores file produced by function saveMarkerModels of R package fitPoly, or alternatively, a data frame containing the following columns:

SampleName: Name of the sample (individual)

MarkerName

Name of the marker

P0

Probabilities of dosage score '0'

P1...

Probabilities of dosage score '1' etc. (up to max dosage, e.g. P4 for tetraploid population)

maxP

Maximum genotype probability identified for a particular individual and marker combination

maxgeno

Most probable dosage for a particular individual and marker combination

geno

Most probable dosage for a particular individual and marker combination, if maxP exceeds a user-defined threshold (e.g. 0.9), otherwise NA

ploidy

The ploidy of parent 1

parent1

character vector with the sample names of parent 1

parent2

character vector with the sample names of parent 2

character vector with the sample names of the F1 individuals

cutoff

Correlation coefficient cut off to declare duplicates. At this correlation coefficient, individuals are merged. If NULL user input will be asked after plotting.

plot_cor

Logical. Should correlation coefficients be plotted? Can be memory/CPU intensive with high number of individuals.

log

Character string specifying the log filename to which standard output should be written. If NULL log is send to stdout.