Learn R Programming

polymapR (version 1.1.1)

screen_for_duplicate_individuals.gp: Screen for duplicate individuals using weighted genotype probabilities

Description

screen_for_duplicate_individuals.gp identifies and merges duplicate individuals based on probabilistic genotypes. See screen_for_duplicate_individuals for the original function.

Usage

screen_for_duplicate_individuals.gp(
  probgeno_df,
  ploidy,
  parent1 = "P1",
  parent2 = "P2",
  F1,
  cutoff = 0.95,
  plot_cor = TRUE,
  saveCOV = "PeasonCC",
  log = NULL
)

Arguments

probgeno_df

A data frame as read from the scores file produced by function saveMarkerModels of R package fitPoly, or alternatively, a data frame containing the following columns:

  • SampleName Name of the sample (individual)

  • MarkerName Name of the marker

  • P0 Probabilities of dosage score '0'

  • P1... Probabilities of dosage score '1' etc. (up to max offspring dosage, e.g. P4 for tetraploid population)

  • maxP Maximum genotype probability identified for a particular individual and marker combination

  • maxgeno Most probable dosage for a particular individual and marker combination

  • geno Most probable dosage for a particular individual and marker combination, if maxP exceeds a user-defined threshold (e.g. 0.9), otherwise NA

ploidy

The ploidy of parent 1

parent1

character vector with the sample names of parent 1

parent2

character vector with the sample names of parent 2

F1

character vector with the sample names of the F1 individuals

cutoff

Correlation coefficient cut off to declare duplicates. At this correlation coefficient, individuals are merged. If NULL user input will be asked after plotting.

plot_cor

Logical. Should correlation coefficients be plotted? Can be memory/CPU intensive with high number of individuals.

saveCOV

A file name where the Pearson's correlation coefficient's variation, number, and mean can be saved

log

Character string specifying the log filename to which standard output should be written. If NULL log is send to stdout.

Value

A data frame similar to input probgeno_df, but with duplicate individuals merged.