Learn R Programming

updog

Updog provides a suite of methods for genotyping polyploids from next-generation sequencing (NGS) data. It does this while accounting for many common features of NGS data: allele bias, overdispersion, and sequencing error. It is named updog for “Using Parental Data for Offspring Genotyping” because we originally developed the method for full-sib populations, but it works now for more general populations. The method is described in detail Gerard et. al. (2018) <doi:10.1534/genetics.118.301468>. Additional details concerning prior specification are described in Gerard and Ferrão (2020) <doi:10.1093/bioinformatics/btz852>.

The main functions are flexdog() and multidog(), which provide many options for the distribution of the genotypes in your sample. A novel genotype distribution is included in the class of proportional normal distributions (model = "norm"). This is the default prior distribution because it is the most robust to varying genotype distributions, but feel free to use more specialized priors if you have more information on the data.

Also provided are:

  • filter_snp(): filter out SNPs based on the output of multidog().
  • format_multidog(): format the output of multidog() in terms of a multidimensional array.
  • Plot methods. Both flexdog() and multidog() have plot methods. See the help files of plot.flexdog() and plot.multidog() for details.
  • Functions to simulate genotypes (rgeno()) and read-counts (rflexdog()). These support all of the models available in flexdog().
  • Functions to evaluate oracle genotyping performance: oracle_joint(), oracle_mis(), oracle_mis_vec(), and oracle_cor(). We mean “oracle” in the sense that we assume that the entire data generation process is known (i.e. the genotype distribution, sequencing error rate, allele bias, and overdispersion are all known). These are good approximations when there are a lot of individuals (but not necessarily large read-depth).

The original updog package is now named updogAlpha and may be found here.

See also ebg, fitPoly, and polyRAD. Our best “competitor” is probably fitPoly, though polyRAD has some nice ideas for utilizing population structure and linkage disequilibrium.

See NEWS for the latest updates on the package.

Vignettes

I’ve included many vignettes in updog, which you can access online here.

Bug Reports

If you find a bug or want an enhancement, please submit an issue here.

Installation

You can install updog from CRAN in the usual way:

install.packages("updog")

You can install the current (unstable) version of updog from GitHub with:

# install.packages("devtools")
devtools::install_github("dcgerard/updog")

How to Cite

Please cite

Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping Polyploids from Messy Sequencing Data. Genetics, 210(3), 789-807. doi: 10.1534/genetics.118.301468.

Or, using BibTex:

@article {gerard2018genotyping,
    author = {Gerard, David and Ferr{\~a}o, Lu{\'i}s Felipe Ventorim and Garcia, Antonio Augusto Franco and Stephens, Matthew},
    title = {Genotyping Polyploids from Messy Sequencing Data},
    volume = {210},
    number = {3},
    pages = {789--807},
    year = {2018},
    doi = {10.1534/genetics.118.301468},
    publisher = {Genetics},
    issn = {0016-6731},
    URL = {https://doi.org/10.1534/genetics.118.301468},
    journal = {Genetics}
}

If you are using the proportional normal prior class (model = "norm"), which is also the default prior, then please also cite:

Gerard D, Ferrão L (2020). “Priors for Genotyping Polyploids.” Bioinformatics, 36(6), 1795-1800. ISSN 1367-4803, doi: 10.1093/bioinformatics/btz852.

Or, using BibTex:

@article{gerard2020priors,
    title = {Priors for Genotyping Polyploids},
    year = {2020},
    journal = {Bioinformatics},
    publisher = {Oxford University Press},
    volume = {36},
    number = {6},
    pages = {1795--1800},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz852},
    author = {David Gerard and Lu{\'i}s Felipe Ventorim Ferr{\~a}o},
  }

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('updog')

Monthly Downloads

415

Version

2.1.5

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

David Gerard

Last Published

November 29th, 2023

Functions in updog (2.1.5)

format_multidog

Return arrayicized elements from the output of multidog.
flexdog

Flexible genotyping for polyploids from next-generation sequencing data.
flexdog_full

Flexible genotyping for polyploids from next-generation sequencing data.
multidog

Fit flexdog to multiple SNPs.
is.multidog

Tests if an argument is a multidog object.
filter_snp

Filter SNPs based on the output of multidog().
oracle_cor_from_joint

Calculate the correlation of the oracle estimator with the true genotype from the joint distribution matrix.
oracle_mis

Calculate oracle misclassification error rate.
oracle_cor

Calculates the correlation between the true genotype and an oracle estimator.
oracle_mis_from_joint

Get the oracle misclassification error rate directly from the joint distribution of the genotype and the oracle estimator.
log_sum_exp_2

Log-sum-exponential trick using just two doubles.
get_q_array

Return the probabilities of an offspring's genotype given its parental genotypes for all possible combinations of parental and offspring genotypes. This is for species with polysomal inheritance and bivalent, non-preferential pairing.
oracle_plot

Construct an oracle plot from the output of oracle_joint.
plot.flexdog

Draw a genotype plot from the output of flexdog.
oracle_joint

The joint probability of the genotype and the genotype estimate of an oracle estimator.
plot.multidog

Plot the output of multidog.
updog-package

updog Flexible Genotyping for Polyploids
rflexdog

Simulate GBS data from the flexdog likelihood.
wem

EM algorithm to fit weighted ash objective.
oracle_mis_vec

Returns the oracle misclassification rates for each genotype.
plot_geno

Make a genotype plot.
rgeno

Simulate individual genotypes from one of the supported flexdog models.
oracle_mis_vec_from_joint

Get the oracle misclassification error rates (conditional on true genotype) directly from the joint distribution of the genotype and the oracle estimator.
snpdat

GBS data from Shirasawa et al (2017)
uitdewilligen

Subset of individuals and SNPs from Uitdewilligen et al (2013).
dbetabinom

The Beta-Binomial Distribution
log_sum_exp

Log-sum-exponential trick.
is.flexdog

Tests if an argument is a flexdog object.