Learn R Programming

Phased Or Unphased LD (pould)

v1.0.1 (October 8, 2020)

The pould package calculates four linkage disequilibrium (LD) statistics – D, Wn and the two conditional asymmetric LD (cALD) measures, WA/B and WB/A – for genotype data from pairs of genetic loci, and can treat these data as either phased or unphased for these calculations. In addition, pould includes LDWrap(), a wrapper function that parses genotype data in BIGDAWG/PyPop input format or haplotype data in HaplObserve output format, LD.sign.test(), which applies a sign test to LD values for phased and unphased haplotypes generated by LDWrap() for a given dataset, and LD.heat.map(), which generates PNG-formatted heat-map plots for each LD measure.

For examples of the application of the pould package, see: Osoegawa et al. Hum Immunol. 2019;80(9):633-643. Osoegawa et al. Hum Immunol. 2019;80(9):644-660.

For more information about cALD, see: Thomson G, Single RM. Conditional asymmetric linkage disequilibrium (ALD): extending the biallelic r2 measure. Genetics. 2014;198(1):321-31.

The pould package can be installed from GitHub using the R devtools package – devtools::install_github("IHIW/pould/pould", build_vignettes = TRUE).

Note: When installing pould from GitHub in a Windows environment, the following warning message may appear on Windows systems that do not have Rtools v3.5 installed:

In untar2(tarfile, files, list, exdir) : skipping pax global extended headers

This warning does not impact the function of the package. Installing Rtools v3.5 will prevent these warnings.

Example

In addiiton to simply calculating LD values, pould can be used to compare LD for phased and unphased versions of the same dataset, e.g., to examine the extent to which phasing via segregation analysis impacts LD relative to phasing estimation via the expectation-maximization (EM) algorithm. In the example below, DRB1 and DQB1 genotype data were extracted from six-locus haplotypes that had been phased using the EM method (Mack et al. Genes Immun. 2018). In the first application of cALD(), that phasing information is ignored, and the EM algorithm is applied to estimate haplotypes. In the second application of cALD(), the original six-locus phasing information is retained.

By comparing the resulting LD values, it becomes clear that LD is uniformly lower for the pre-phased DRB1~DQB1 haplotypes than for the de novo EM estimated haplotypes. This suggests that the EM algorithm may not be accurately estimating haplotypes low-frequency (counts < 4) haplotypes for individual locus pairs during multi-locus haplotype estimation, as the number of EM estimated haplotypes evaluated (53) is considerably lower than the number of pre-phased haplotypes evaluaded (106).

## Comparing LD values for haplotypes generated by the EM algorithm (default = unphased) to LD values for haplotypes for which phased is known.
library("pould")
data(drb1.dqb1.demo)
cALD(drb1.dqb1.demo,inPhase=FALSE)
#> Calculating D', Wn and conditional ALD for 53 unphased genotypes at the DRB1 and DQB1 loci.
#> D' for DRB1~DQB1 haplotypes: 0.958463648286022 (0.9585) 
#> Wn for DRB1~DQB1 haplotypes: 0.811184751666017 (0.8112) 
#> Variation of DQB1 conditioned on DRB1 (WDQB1/DRB1) = 0.903300936956993 (0.9033)
#> Variation of DRB1 conditioned on DQB1 (WDRB1/DQB1) = 0.778712698006812 (0.7787)

cALD(drb1.dqb1.demo,inPhase=TRUE)
#> Calculating D', Wn and conditional ALD for 106 phased genotypes at the DRB1 and DQB1 loci.
#> D' for DRB1~DQB1 haplotypes: 0.878076460805524 (0.8781) 
#> Wn for DRB1~DQB1 haplotypes: 0.733800978595899 (0.7338) 
#> Variation of DQB1 conditioned on DRB1 (WDQB1/DRB1) = 0.822989521285103 (0.823)
#> Variation of DRB1 conditioned on DQB1 (WDRB1/DQB1) = 0.721861349887199 (0.7219)

Copy Link

Version

Install

install.packages('pould')

Monthly Downloads

159

Version

1.0.1

License

GPL (>= 3)

Maintainer

Steven Mack

Last Published

October 16th, 2020

Functions in pould (1.0.1)

writeVector

Exporting Haplotype Vectors
parseGenotypes

Reformat columnnar genotype data to GL String format
LD.heat.map

Generates heat-maps for four linkage disequilibrium (LD) values (D', Wn, WLoc1/Loc2 and WLoc2/Loc1) generated for all pairs of phased and unphased two-locus haplotypes by LDWrap().
hla.hap.demo

Example Six-Locus HLA Haplotype Data in GL String Format
drb1.dqb1.demo

Example HLA Genotype Data for DRB1 and DQB1
cALD

Calculation of the \(D'\), \(Wn\), and conditional Asymmetric LD Measures
LDWrap

Parser for CSV-formatted GL String Haplotype Data
extractLoci

Extract Locus Information from Supplied Haplotype Data
trimAlleles

Truncate allele names in haplotypes to the specified number of fields.
LD.sign.test

Perform the sign test on paired LD values for phased and unphased haplotypes